diff --git a/design-document.md b/design-document.md new file mode 100644 index 0000000..eb25407 --- /dev/null +++ b/design-document.md @@ -0,0 +1,929 @@ +Recommended approach + +Use: + + +→ URL.createObjectURL(file) +→ hidden +→ sample frames periodically +→ draw sampled frame to canvas +→ compare against previous accepted frame +→ keep only significant-change frames +→ user clicks Copy on selected image +→ navigator.clipboard.write() + +URL.createObjectURL(file) gives the browser a local blob URL for the selected file, and you should release it with URL.revokeObjectURL() when done. + +Do not analyze every frame at full resolution + +For change detection, use a small analysis canvas: + +video frame → draw to 160×90 or 320×180 canvas → compare pixels + +Then only render the full-resolution frame when you decide it is worth keeping. + +Example strategy: + +const ANALYSIS_W = 160; +const ANALYSIS_H = 90; +const SAMPLE_INTERVAL = 0.5; // seconds +const CHANGE_THRESHOLD = 0.18; // tune this + +That means a 10-minute video sampled every 0.5 seconds is only: + +10 min × 60 / 0.5 = 1200 comparisons + +At 160×90, that is cheap. + +Simple frame-difference algorithm + +Use a downscaled canvas and compare luminance differences: + +function frameDifference(a, b) { + let changed = 0; + const pixels = a.data.length / 4; + + for (let i = 0; i < a.data.length; i += 4) { + const lumA = 0.299 * a.data[i] + 0.587 * a.data[i + 1] + 0.114 * a.data[i + 2]; + const lumB = 0.299 * b.data[i] + 0.587 * b.data[i + 1] + 0.114 * b.data[i + 2]; + + if (Math.abs(lumA - lumB) > 32) changed++; + } + + return changed / pixels; +} + +Then: + +if (diff > CHANGE_THRESHOLD) { + // save this timestamp as a candidate key frame +} +Better heuristic + +Instead of comparing every sampled frame to the immediately previous sample, compare it to the last accepted key frame. + +That avoids saving several frames during the same transition. + +Frame A accepted +Frame B slightly different → reject +Frame C slightly different → reject +Frame D substantially different from A → accept + +Also add a minimum time gap: + +const MIN_SECONDS_BETWEEN_ACCEPTED = 2.0; +Clipboard copying + +Copying images to the clipboard is possible with navigator.clipboard.write() and ClipboardItem. Browsers commonly support PNG image data, but clipboard access requires a secure context such as HTTPS or localhost, and browser/user activation rules apply. + +Example: + +async function copyCanvasToClipboard(canvas) { + const blob = await new Promise(resolve => + canvas.toBlob(resolve, "image/png") + ); + + await navigator.clipboard.write([ + new ClipboardItem({ + "image/png": blob + }) + ]); +} + +Important: this should happen from a user gesture, like clicking a Copy button. + +Memory profile + +This is light if implemented correctly: + +1 local video blob URL +1 hidden video element +1 small analysis canvas +1 full-res preview canvas/image +4–10 saved output blobs + +Avoid: + +Storing hundreds/thousands of full-res canvases +Storing every sampled frame +Using FileReader.readAsDataURL(video) +Exporting a huge ZIP unless needed + +For your target, keeping 4–10 PNG/WebP blobs in memory is fine. + +Practical UX flow + +I would build it like this: + +User selects video. +App scans with adjustable settings: +sample interval +change threshold +minimum seconds between captures +max captures +App shows candidate thumbnails. +User deletes bad picks if needed. +Each thumbnail has: +Copy PNG +Download +timestamp +Optional: button to rescan with stricter/looser threshold. +Caveats +Plain seeking may not be perfectly frame-accurate. +Sampling every 0.25–1.0s is much more practical than decoding every frame. +Browser codec support matters. MP4/H.264 is safest. +Safari/Firefox clipboard image support can be more limited than Chromium-based browsers. +For exact frame extraction, use WebCodecs or ffmpeg.wasm, but for your “significant scene change” use case, + canvas is probably enough. +My recommended V1 stack +React/Vite +HTMLVideoElement +Canvas API +Blob/Object URLs +navigator.clipboard.write() +Optional: localForage or IndexedDB if you want session persistence + +No backend required. No upload required. + +What would be the best framework or library to handle these tasks? Come up with a detailed implementation / design document. + +I’ll treat this as a local-only browser app design, optimized for your 100–1000 MB input range and 4–10 final images. I’ll ground the library/API choices against current browser support rather than assuming older constraints. + +Thought for 5s +Recommendation + +Use React + Vite + TypeScript for the app shell, but do the actual video work with native browser APIs first: + +Vite + React + TypeScript +HTMLVideoElement +Canvas / OffscreenCanvas +Web Worker +Blob URLs +Clipboard API + +For your use case, I would not start with ffmpeg.wasm. It is powerful, but it is overkill for scene/key-frame extraction and can become memory-heavy with 100–1000 MB files. ffmpeg.wasm is a browser/WebAssembly port of FFmpeg, but many real-world reports and project issues revolve around large-file memory limits and browser stability. + +For a stronger V2, consider Mediabunny + WebCodecs. Mediabunny is a TypeScript media toolkit designed for browser-side reading/writing/conversion, and WebCodecs exposes native VideoFrame, EncodedVideoChunk, VideoDecoder, and VideoEncoder primitives. + +Design Document: Local Video Key-Frame Extractor +1. Goal + +Build a browser-only app that allows a user to: + +Select a local video file. +Analyze it without uploading to a server. +Detect frames where the image content has significantly changed. +Present a small set of candidate key images. +Let the user copy selected images to the clipboard. + +Target input: + +Video size: 100–1000 MB +Expected output: 4–10 still images +Processing model: local-only +Backend: none required +2. Recommended architecture +User selects video + ↓ +Create local Blob URL + ↓ +Hidden element loads local file + ↓ +Sampling engine seeks through video + ↓ +Downscaled analysis canvas compares frames + ↓ +Candidate detector stores timestamps + ↓ +Full-resolution renderer extracts selected frames + ↓ +Gallery shows thumbnails + ↓ +User copies PNG to clipboard + +Core browser APIs: + +Task API +Local file selection File API +Local video loading URL.createObjectURL() +Video decoding/seeking HTMLVideoElement +Pixel comparison Canvas API +Off-main-thread analysis Worker + OffscreenCanvas where available +Image export canvas.toBlob() +Clipboard copy navigator.clipboard.write() + ClipboardItem + +URL.createObjectURL() is appropriate here because it creates a local blob URL for a File/Blob, avoiding the need to load the entire file into JavaScript memory. The Clipboard API supports writing non-text data through ClipboardItem, but it requires a secure context such as HTTPS or localhost. + +3. Framework choice +Best V1 stack +Vite +React +TypeScript +Zustand or simple React state +Canvas API +Web Worker + +Why: + +Vite: fast local dev, easy static deployment. +React: good for gallery, controls, progress UI, and stateful review flow. +TypeScript: useful because media processing code quickly accumulates edge cases. +No backend: app can deploy as static files. +No heavy media dependency initially: browser decoding is enough for threshold-based change detection. + +Suggested install: + +npm create vite@latest video-keyframe-extractor -- --template react-ts +cd video-keyframe-extractor +npm install +npm run dev + +Optional dependencies: + +npm install zustand + +I would avoid extra UI frameworks unless you want polished controls quickly. + +4. Library/API decision matrix +Option Use for V1? Notes +Native + canvas Yes Best starting point. Simple, no server, no large WASM payload. +Web Worker Yes Keeps UI responsive during scans. +OffscreenCanvas Yes, optional Good for worker-side pixel analysis where supported. +WebCodecs V2 Better low-level control, but more complex. WebCodecs is browser-native and provides direct access to decoded frames. +Mediabunny V2 / advanced Good candidate for more robust container/media handling. It advertises efficient browser-side reading and only loading what is needed. +ffmpeg.wasm Avoid for V1 Heavy and memory-risky for 100–1000 MB videos. Better for fallback/export/transcoding, not first-pass scene detection. +OpenCV.js Probably avoid Useful for advanced CV, but heavy for simple scene-difference detection. +5. App modules +5.1 File intake module + +Responsibilities: + +Accept local video file. +Validate file type and size. +Create blob URL. +Load video metadata. +Display duration, resolution, codec/browser compatibility status if possible. + +Example state: + +type LoadedVideo = { + file: File; + objectUrl: string; + duration: number; + width: number; + height: number; +}; + +Important cleanup: + +URL.revokeObjectURL(objectUrl); + +Never use this for the video: + +FileReader.readAsDataURL(file); + +That would load and base64-expand the whole video in memory. + +5.2 Scan settings module + +Expose these controls: + +type ScanSettings = { + sampleIntervalSeconds: number; // default: 0.5 + analysisWidth: number; // default: 160 + analysisHeight: number; // default: 90 + pixelDeltaThreshold: number; // default: 32 + changedPixelRatioThreshold: number; // default: 0.18 + minSecondsBetweenCaptures: number; // default: 2.0 + maxCandidates: number; // default: 20 + finalTargetCount: number; // default: 8 +}; + +Suggested defaults: + +const defaultSettings: ScanSettings = { + sampleIntervalSeconds: 0.5, + analysisWidth: 160, + analysisHeight: 90, + pixelDeltaThreshold: 32, + changedPixelRatioThreshold: 0.18, + minSecondsBetweenCaptures: 2.0, + maxCandidates: 20, + finalTargetCount: 8, +}; +5.3 Frame sampling module + +V1 should seek using a hidden video element: + +async function seekVideo(video: HTMLVideoElement, time: number): Promise { + return new Promise((resolve, reject) => { + const onSeeked = () => { + cleanup(); + resolve(); + }; + + const onError = () => { + cleanup(); + reject(video.error); + }; + + const cleanup = () => { + video.removeEventListener("seeked", onSeeked); + video.removeEventListener("error", onError); + }; + + video.addEventListener("seeked", onSeeked, { once: true }); + video.addEventListener("error", onError, { once: true }); + + video.currentTime = Math.min(time, video.duration); + }); +} + +Then draw into a small analysis canvas: + +function captureAnalysisFrame( + video: HTMLVideoElement, + canvas: HTMLCanvasElement, + width: number, + height: number +): ImageData { + canvas.width = width; + canvas.height = height; + + const ctx = canvas.getContext("2d", { willReadFrequently: true }); + if (!ctx) throw new Error("Could not get canvas context"); + + ctx.drawImage(video, 0, 0, width, height); + return ctx.getImageData(0, 0, width, height); +} +6. Change detection algorithm +6.1 Simple and effective V1 + +Compare current sampled frame against the last accepted key frame, not merely the immediately previous sample. + +This prevents over-detecting long transitions. + +function frameDifferenceRatio( + a: ImageData, + b: ImageData, + pixelDeltaThreshold: number +): number { + const dataA = a.data; + const dataB = b.data; + let changed = 0; + const pixels = dataA.length / 4; + + for (let i = 0; i < dataA.length; i += 4) { + const lumA = + 0.299 * dataA[i] + + 0.587 * dataA[i + 1] + + 0.114 * dataA[i + 2]; + + const lumB = + 0.299 * dataB[i] + + 0.587 * dataB[i + 1] + + 0.114 * dataB[i + 2]; + + if (Math.abs(lumA - lumB) > pixelDeltaThreshold) { + changed++; + } + } + + return changed / pixels; +} + +Acceptance logic: + +function shouldAcceptCandidate(params: { + diffRatio: number; + currentTime: number; + lastAcceptedTime: number | null; + changedPixelRatioThreshold: number; + minSecondsBetweenCaptures: number; +}) { + const { + diffRatio, + currentTime, + lastAcceptedTime, + changedPixelRatioThreshold, + minSecondsBetweenCaptures, + } = params; + + if (diffRatio < changedPixelRatioThreshold) return false; + + if ( + lastAcceptedTime !== null && + currentTime - lastAcceptedTime < minSecondsBetweenCaptures + ) { + return false; + } + + return true; +} +6.2 Better V1.5: histogram difference + +Raw pixel comparison can be sensitive to motion, noise, flashes, compression artifacts, and camera pans. + +A better approach is to compare low-resolution color/luma histograms. + +Recommended hybrid score: + +score = 0.7 × luminance histogram difference + + 0.3 × pixel difference ratio + +This tends to better detect scene/content changes instead of tiny motion. + +6.3 Optional V2: perceptual hash + +For more stable detection: + +Generate a small grayscale frame, e.g. 32×32. +Compute average hash or difference hash. +Compare Hamming distance. +Accept when distance exceeds threshold. + +This is good when videos have subtle compression differences but meaningful visual changes. + +7. Scan workflow + +Pseudo-code: + +async function scanVideo( + video: HTMLVideoElement, + settings: ScanSettings, + onProgress: (progress: number) => void +): Promise { + const analysisCanvas = document.createElement("canvas"); + + const candidates: CandidateFrame[] = []; + let lastAcceptedFrame: ImageData | null = null; + let lastAcceptedTime: number | null = null; + + for ( + let time = 0; + time < video.duration; + time += settings.sampleIntervalSeconds + ) { + await seekVideo(video, time); + + const imageData = captureAnalysisFrame( + video, + analysisCanvas, + settings.analysisWidth, + settings.analysisHeight + ); + + if (!lastAcceptedFrame) { + candidates.push({ + time, + score: 1, + reason: "initial-frame", + }); + + lastAcceptedFrame = imageData; + lastAcceptedTime = time; + continue; + } + + const diffRatio = frameDifferenceRatio( + lastAcceptedFrame, + imageData, + settings.pixelDeltaThreshold + ); + + if ( + shouldAcceptCandidate({ + diffRatio, + currentTime: time, + lastAcceptedTime, + changedPixelRatioThreshold: settings.changedPixelRatioThreshold, + minSecondsBetweenCaptures: settings.minSecondsBetweenCaptures, + }) + ) { + candidates.push({ + time, + score: diffRatio, + reason: "visual-change", + }); + + lastAcceptedFrame = imageData; + lastAcceptedTime = time; + } + + onProgress(time / video.duration); + + if (candidates.length >= settings.maxCandidates) { + // optional: stop early or continue and later rank candidates + } + } + + return rankAndTrimCandidates(candidates, settings.finalTargetCount); +} + +Candidate type: + +type CandidateFrame = { + time: number; + score: number; + reason: "initial-frame" | "visual-change" | "manual"; + thumbnailUrl?: string; + fullBlob?: Blob; +}; +8. Candidate ranking + +Since the desired final count is only 4–10, do not simply take every threshold crossing. + +Use ranking: + +Always include first frame if useful. +Sort detected changes by score. +Enforce time diversity. +Keep the top N. + +Example: + +function rankAndTrimCandidates( + candidates: CandidateFrame[], + targetCount: number +): CandidateFrame[] { + return [...candidates] + .sort((a, b) => b.score - a.score) + .slice(0, targetCount) + .sort((a, b) => a.time - b.time); +} + +Better version: + +Divide video into N temporal buckets +Pick the strongest candidate from each bucket +Then fill remaining slots by highest score + +This avoids ending up with all images from one active segment. + +9. Full-resolution extraction + +Only render full-resolution images for selected candidates. + +async function extractFullResolutionFrame( + video: HTMLVideoElement, + time: number, + type: "image/png" | "image/jpeg" | "image/webp" = "image/png", + quality?: number +): Promise { + await seekVideo(video, time); + + const canvas = document.createElement("canvas"); + canvas.width = video.videoWidth; + canvas.height = video.videoHeight; + + const ctx = canvas.getContext("2d"); + if (!ctx) throw new Error("Could not get canvas context"); + + ctx.drawImage(video, 0, 0); + + return new Promise((resolve, reject) => { + canvas.toBlob( + blob => { + if (!blob) reject(new Error("Canvas export failed")); + else resolve(blob); + }, + type, + quality + ); + }); +} + +For clipboard, prefer PNG: + +async function copyImageBlobToClipboard(blob: Blob) { + await navigator.clipboard.write([ + new ClipboardItem({ + [blob.type]: blob, + }), + ]); +} + +Use: + +const blob = await extractFullResolutionFrame(video, candidate.time, "image/png"); +await copyImageBlobToClipboard(blob); + +Clipboard image writing should be initiated by a user action, such as clicking a Copy button, and the app should be served from HTTPS or localhost because the Clipboard API is gated behind secure contexts. + +10. UI design +Main layout +┌──────────────────────────────────────────────┐ +│ Video Key Frame Extractor │ +├──────────────────────────────────────────────┤ +│ [Select video] filename.mp4 │ +│ Duration: 04:32 | Size: 612 MB | 1920×1080 │ +├──────────────────────────────────────────────┤ +│ Scan Settings │ +│ Sample interval: [0.5s] │ +│ Change threshold: [18%] │ +│ Min gap: [2.0s] │ +│ Target images: [8] │ +│ [Scan Video] │ +├──────────────────────────────────────────────┤ +│ Progress bar │ +├──────────────────────────────────────────────┤ +│ Candidate Gallery │ +│ [thumb] 00:04.5 score .24 [Copy] [Download] │ +│ [thumb] 00:18.0 score .41 [Copy] [Download] │ +│ [thumb] 01:22.5 score .37 [Copy] [Download] │ +└──────────────────────────────────────────────┘ +Controls + +Minimum controls: + +Select video +Scan +Stop scan +Reset +Threshold slider +Sample interval dropdown +Target count +Copy image +Download image + +Advanced controls: + +“Prefer fewer images” +“Prefer more images” +“Ignore tiny motion” +“Include first frame” +“Manually add current preview frame” +11. State model + +Use Zustand or plain React reducer. + +type AppState = { + video: LoadedVideo | null; + scanStatus: "idle" | "loading" | "scanning" | "done" | "error"; + settings: ScanSettings; + candidates: CandidateFrame[]; + selectedCandidateId: string | null; + progress: number; + error: string | null; +}; + +Actions: + +type AppActions = { + loadVideo(file: File): Promise; + scanVideo(): Promise; + cancelScan(): void; + updateSettings(settings: Partial): void; + copyCandidate(candidateId: string): Promise; + downloadCandidate(candidateId: string): Promise; + removeCandidate(candidateId: string): void; +}; +12. Worker design + +For V1, the main thread can own the hidden video element because HTMLVideoElement is DOM-bound. But pixel comparison can be pushed into a worker. + +Main thread: + +seek video +draw frame to analysis canvas +get ImageData +send ImageData buffer to worker +worker returns diff score + +Worker message: + +type AnalyzeFrameMessage = { + type: "analyze-frame"; + current: ImageData; + previous: ImageData; + pixelDeltaThreshold: number; +}; + +Worker response: + +type AnalyzeFrameResult = { + type: "frame-score"; + diffRatio: number; +}; + +For your target size, this may not be necessary at first. But it is worth structuring the code so the algorithm can move into a worker later. + +13. Performance expectations + +For a 10-minute video: + +Sample interval: 0.5s +Comparisons: ~1200 +Analysis resolution: 160×90 +Pixels per comparison: 14,400 +Total pixel comparisons: ~17.3 million + +That is very reasonable in browser JavaScript. + +For a 30-minute video: + +Sample interval: 0.5s +Comparisons: ~3600 +Analysis resolution: 160×90 +Total pixel comparisons: ~51.8 million + +Still reasonable. The bottleneck will usually be video seeking/decoding, not the pixel math. + +14. File-size considerations + +Your 100–1000 MB range is practical because: + +The video is referenced via local file blob URL. +You only decode sampled frames. +You only store 4–10 output blobs. +Analysis frames are tiny. + +Memory use should look roughly like: + +Video decoder buffers: browser-managed +Analysis frame A: ~160 × 90 × 4 = 57.6 KB +Analysis frame B: ~57.6 KB +Full-res output image: only when accepted/copied +Candidate blobs: 4–10 images + +At 1920×1080, one raw RGBA canvas frame is about: + +1920 × 1080 × 4 = ~8.3 MB + +At 3840×2160: + +3840 × 2160 × 4 = ~33.2 MB + +Still fine if you only create one full-res canvas at a time. + +15. Browser compatibility +Best target + +Start with Chromium-based browsers: + +Chrome +Edge +Brave +Arc +Chromium + +Reasons: + +Strong media support. +Good Clipboard API support. +Good WebCodecs path for later. + +Firefox should work for the V1 + canvas path. WebCodecs VideoDecoder is listed by MDN as supported in Firefox 130+, while Chromium browsers have supported it since version 94. + +Safari may work for simple video/canvas extraction, but clipboard image writing and codec behavior are areas to test carefully. + +16. Security and privacy model + +The app should explicitly state: + +Your video is processed locally in your browser. +The file is not uploaded. +No server receives the video or extracted images. + +Implementation requirements: + +Do not send file contents to analytics. +Do not log object URLs remotely. +Avoid third-party scripts if privacy is part of the value prop. +Serve over HTTPS. +Make the app installable as a PWA if desired. +17. Error handling + +Handle: + +Case Response +Unsupported video codec Show “This browser cannot decode this video.” +Huge resolution Offer lower-quality extraction or warn about memory. +Clipboard blocked Offer download fallback. +Scan too slow Let user increase sample interval. +Too many candidates Raise threshold or bucket results. +No candidates found Lower threshold or include first/middle/end frames. + +Clipboard fallback: + +try { + await copyImageBlobToClipboard(blob); +} catch { + downloadBlob(blob, "frame.png"); +} +18. Suggested file structure +src/ + app/ + App.tsx + store.ts + components/ + FilePicker.tsx + VideoMetadataPanel.tsx + ScanSettingsPanel.tsx + ProgressPanel.tsx + CandidateGallery.tsx + CandidateCard.tsx + media/ + loadVideo.ts + seekVideo.ts + captureFrame.ts + extractFullFrame.ts + clipboard.ts + analysis/ + frameDifference.ts + histogramDifference.ts + candidateRanking.ts + scanVideo.ts + workers/ + frameAnalysis.worker.ts + types/ + media.ts + scan.ts +19. MVP implementation plan +Phase 1: Core prototype + +Build: + +File picker +Hidden video loader +Metadata display +Scan button +Basic frame-difference detection +Candidate list with timestamps +Thumbnail generation + +No worker yet. + +Phase 2: Copy/export + +Add: + +Full-res frame extraction +Copy PNG to clipboard +Download PNG fallback +Remove candidate +Manual add candidate +Phase 3: Better detection + +Add: + +Histogram comparison +Temporal bucketing +Threshold presets +“Strict / Balanced / Sensitive” modes + +Example presets: + +const presets = { + strict: { + sampleIntervalSeconds: 1.0, + changedPixelRatioThreshold: 0.28, + minSecondsBetweenCaptures: 4, + }, + balanced: { + sampleIntervalSeconds: 0.5, + changedPixelRatioThreshold: 0.18, + minSecondsBetweenCaptures: 2, + }, + sensitive: { + sampleIntervalSeconds: 0.25, + changedPixelRatioThreshold: 0.1, + minSecondsBetweenCaptures: 1, + }, +}; +Phase 4: Worker optimization + +Move frame comparison into a worker. + +Phase 5: Advanced media backend + +Evaluate: + +Mediabunny + WebCodecs + +This is the path if you want better control over frame extraction, codecs, and container-level handling. Mediabunny is actively positioned as a modern browser media toolkit, and WebCodecs is the underlying native browser primitive for direct encoded/decoded media handling. + +20. Final recommendation + +For your app, build V1 with: + +Vite + React + TypeScript +HTMLVideoElement +Canvas +Blob URLs +Clipboard API +Optional Zustand + +Do not start with: + +ffmpeg.wasm +OpenCV.js +Server-side processing + +Best V1 strategy: + +Sample every 0.5s +Analyze at 160×90 +Compare against last accepted key frame +Use threshold + minimum time gap +Rank candidates down to 4–10 +Extract full-res only when displaying/copying \ No newline at end of file