Recommended approach Use: → URL.createObjectURL(file) → hidden → sample frames periodically → draw sampled frame to canvas → compare against previous accepted frame → keep only significant-change frames → user clicks Copy on selected image → navigator.clipboard.write() URL.createObjectURL(file) gives the browser a local blob URL for the selected file, and you should release it with URL.revokeObjectURL() when done. Do not analyze every frame at full resolution For change detection, use a small analysis canvas: video frame → draw to 160×90 or 320×180 canvas → compare pixels Then only render the full-resolution frame when you decide it is worth keeping. Example strategy: const ANALYSIS_W = 160; const ANALYSIS_H = 90; const SAMPLE_INTERVAL = 0.5; // seconds const CHANGE_THRESHOLD = 0.18; // tune this That means a 10-minute video sampled every 0.5 seconds is only: 10 min × 60 / 0.5 = 1200 comparisons At 160×90, that is cheap. Simple frame-difference algorithm Use a downscaled canvas and compare luminance differences: function frameDifference(a, b) { let changed = 0; const pixels = a.data.length / 4; for (let i = 0; i < a.data.length; i += 4) { const lumA = 0.299 * a.data[i] + 0.587 * a.data[i + 1] + 0.114 * a.data[i + 2]; const lumB = 0.299 * b.data[i] + 0.587 * b.data[i + 1] + 0.114 * b.data[i + 2]; if (Math.abs(lumA - lumB) > 32) changed++; } return changed / pixels; } Then: if (diff > CHANGE_THRESHOLD) { // save this timestamp as a candidate key frame } Better heuristic Instead of comparing every sampled frame to the immediately previous sample, compare it to the last accepted key frame. That avoids saving several frames during the same transition. Frame A accepted Frame B slightly different → reject Frame C slightly different → reject Frame D substantially different from A → accept Also add a minimum time gap: const MIN_SECONDS_BETWEEN_ACCEPTED = 2.0; Clipboard copying Copying images to the clipboard is possible with navigator.clipboard.write() and ClipboardItem. Browsers commonly support PNG image data, but clipboard access requires a secure context such as HTTPS or localhost, and browser/user activation rules apply. Example: async function copyCanvasToClipboard(canvas) { const blob = await new Promise(resolve => canvas.toBlob(resolve, "image/png") ); await navigator.clipboard.write([ new ClipboardItem({ "image/png": blob }) ]); } Important: this should happen from a user gesture, like clicking a Copy button. Memory profile This is light if implemented correctly: 1 local video blob URL 1 hidden video element 1 small analysis canvas 1 full-res preview canvas/image 4–10 saved output blobs Avoid: Storing hundreds/thousands of full-res canvases Storing every sampled frame Using FileReader.readAsDataURL(video) Exporting a huge ZIP unless needed For your target, keeping 4–10 PNG/WebP blobs in memory is fine. Practical UX flow I would build it like this: User selects video. App scans with adjustable settings: sample interval change threshold minimum seconds between captures max captures App shows candidate thumbnails. User deletes bad picks if needed. Each thumbnail has: Copy PNG Download timestamp Optional: button to rescan with stricter/looser threshold. Caveats Plain seeking may not be perfectly frame-accurate. Sampling every 0.25–1.0s is much more practical than decoding every frame. Browser codec support matters. MP4/H.264 is safest. Safari/Firefox clipboard image support can be more limited than Chromium-based browsers. For exact frame extraction, use WebCodecs or ffmpeg.wasm, but for your “significant scene change” use case, + canvas is probably enough. My recommended V1 stack React/Vite HTMLVideoElement Canvas API Blob/Object URLs navigator.clipboard.write() Optional: localForage or IndexedDB if you want session persistence No backend required. No upload required. What would be the best framework or library to handle these tasks? Come up with a detailed implementation / design document. I’ll treat this as a local-only browser app design, optimized for your 100–1000 MB input range and 4–10 final images. I’ll ground the library/API choices against current browser support rather than assuming older constraints. Thought for 5s Recommendation Use React + Vite + TypeScript for the app shell, but do the actual video work with native browser APIs first: Vite + React + TypeScript HTMLVideoElement Canvas / OffscreenCanvas Web Worker Blob URLs Clipboard API For your use case, I would not start with ffmpeg.wasm. It is powerful, but it is overkill for scene/key-frame extraction and can become memory-heavy with 100–1000 MB files. ffmpeg.wasm is a browser/WebAssembly port of FFmpeg, but many real-world reports and project issues revolve around large-file memory limits and browser stability. For a stronger V2, consider Mediabunny + WebCodecs. Mediabunny is a TypeScript media toolkit designed for browser-side reading/writing/conversion, and WebCodecs exposes native VideoFrame, EncodedVideoChunk, VideoDecoder, and VideoEncoder primitives. Design Document: Local Video Key-Frame Extractor 1. Goal Build a browser-only app that allows a user to: Select a local video file. Analyze it without uploading to a server. Detect frames where the image content has significantly changed. Present a small set of candidate key images. Let the user copy selected images to the clipboard. Target input: Video size: 100–1000 MB Expected output: 4–10 still images Processing model: local-only Backend: none required 2. Recommended architecture User selects video ↓ Create local Blob URL ↓ Hidden element loads local file ↓ Sampling engine seeks through video ↓ Downscaled analysis canvas compares frames ↓ Candidate detector stores timestamps ↓ Full-resolution renderer extracts selected frames ↓ Gallery shows thumbnails ↓ User copies PNG to clipboard Core browser APIs: Task API Local file selection File API Local video loading URL.createObjectURL() Video decoding/seeking HTMLVideoElement Pixel comparison Canvas API Off-main-thread analysis Worker + OffscreenCanvas where available Image export canvas.toBlob() Clipboard copy navigator.clipboard.write() + ClipboardItem URL.createObjectURL() is appropriate here because it creates a local blob URL for a File/Blob, avoiding the need to load the entire file into JavaScript memory. The Clipboard API supports writing non-text data through ClipboardItem, but it requires a secure context such as HTTPS or localhost. 3. Framework choice Best V1 stack Vite React TypeScript Zustand or simple React state Canvas API Web Worker Why: Vite: fast local dev, easy static deployment. React: good for gallery, controls, progress UI, and stateful review flow. TypeScript: useful because media processing code quickly accumulates edge cases. No backend: app can deploy as static files. No heavy media dependency initially: browser decoding is enough for threshold-based change detection. Suggested install: npm create vite@latest video-keyframe-extractor -- --template react-ts cd video-keyframe-extractor npm install npm run dev Optional dependencies: npm install zustand I would avoid extra UI frameworks unless you want polished controls quickly. 4. Library/API decision matrix Option Use for V1? Notes Native + canvas Yes Best starting point. Simple, no server, no large WASM payload. Web Worker Yes Keeps UI responsive during scans. OffscreenCanvas Yes, optional Good for worker-side pixel analysis where supported. WebCodecs V2 Better low-level control, but more complex. WebCodecs is browser-native and provides direct access to decoded frames. Mediabunny V2 / advanced Good candidate for more robust container/media handling. It advertises efficient browser-side reading and only loading what is needed. ffmpeg.wasm Avoid for V1 Heavy and memory-risky for 100–1000 MB videos. Better for fallback/export/transcoding, not first-pass scene detection. OpenCV.js Probably avoid Useful for advanced CV, but heavy for simple scene-difference detection. 5. App modules 5.1 File intake module Responsibilities: Accept local video file. Validate file type and size. Create blob URL. Load video metadata. Display duration, resolution, codec/browser compatibility status if possible. Example state: type LoadedVideo = { file: File; objectUrl: string; duration: number; width: number; height: number; }; Important cleanup: URL.revokeObjectURL(objectUrl); Never use this for the video: FileReader.readAsDataURL(file); That would load and base64-expand the whole video in memory. 5.2 Scan settings module Expose these controls: type ScanSettings = { sampleIntervalSeconds: number; // default: 0.5 analysisWidth: number; // default: 160 analysisHeight: number; // default: 90 pixelDeltaThreshold: number; // default: 32 changedPixelRatioThreshold: number; // default: 0.18 minSecondsBetweenCaptures: number; // default: 2.0 maxCandidates: number; // default: 20 finalTargetCount: number; // default: 8 }; Suggested defaults: const defaultSettings: ScanSettings = { sampleIntervalSeconds: 0.5, analysisWidth: 160, analysisHeight: 90, pixelDeltaThreshold: 32, changedPixelRatioThreshold: 0.18, minSecondsBetweenCaptures: 2.0, maxCandidates: 20, finalTargetCount: 8, }; 5.3 Frame sampling module V1 should seek using a hidden video element: async function seekVideo(video: HTMLVideoElement, time: number): Promise { return new Promise((resolve, reject) => { const onSeeked = () => { cleanup(); resolve(); }; const onError = () => { cleanup(); reject(video.error); }; const cleanup = () => { video.removeEventListener("seeked", onSeeked); video.removeEventListener("error", onError); }; video.addEventListener("seeked", onSeeked, { once: true }); video.addEventListener("error", onError, { once: true }); video.currentTime = Math.min(time, video.duration); }); } Then draw into a small analysis canvas: function captureAnalysisFrame( video: HTMLVideoElement, canvas: HTMLCanvasElement, width: number, height: number ): ImageData { canvas.width = width; canvas.height = height; const ctx = canvas.getContext("2d", { willReadFrequently: true }); if (!ctx) throw new Error("Could not get canvas context"); ctx.drawImage(video, 0, 0, width, height); return ctx.getImageData(0, 0, width, height); } 6. Change detection algorithm 6.1 Simple and effective V1 Compare current sampled frame against the last accepted key frame, not merely the immediately previous sample. This prevents over-detecting long transitions. function frameDifferenceRatio( a: ImageData, b: ImageData, pixelDeltaThreshold: number ): number { const dataA = a.data; const dataB = b.data; let changed = 0; const pixels = dataA.length / 4; for (let i = 0; i < dataA.length; i += 4) { const lumA = 0.299 * dataA[i] + 0.587 * dataA[i + 1] + 0.114 * dataA[i + 2]; const lumB = 0.299 * dataB[i] + 0.587 * dataB[i + 1] + 0.114 * dataB[i + 2]; if (Math.abs(lumA - lumB) > pixelDeltaThreshold) { changed++; } } return changed / pixels; } Acceptance logic: function shouldAcceptCandidate(params: { diffRatio: number; currentTime: number; lastAcceptedTime: number | null; changedPixelRatioThreshold: number; minSecondsBetweenCaptures: number; }) { const { diffRatio, currentTime, lastAcceptedTime, changedPixelRatioThreshold, minSecondsBetweenCaptures, } = params; if (diffRatio < changedPixelRatioThreshold) return false; if ( lastAcceptedTime !== null && currentTime - lastAcceptedTime < minSecondsBetweenCaptures ) { return false; } return true; } 6.2 Better V1.5: histogram difference Raw pixel comparison can be sensitive to motion, noise, flashes, compression artifacts, and camera pans. A better approach is to compare low-resolution color/luma histograms. Recommended hybrid score: score = 0.7 × luminance histogram difference + 0.3 × pixel difference ratio This tends to better detect scene/content changes instead of tiny motion. 6.3 Optional V2: perceptual hash For more stable detection: Generate a small grayscale frame, e.g. 32×32. Compute average hash or difference hash. Compare Hamming distance. Accept when distance exceeds threshold. This is good when videos have subtle compression differences but meaningful visual changes. 7. Scan workflow Pseudo-code: async function scanVideo( video: HTMLVideoElement, settings: ScanSettings, onProgress: (progress: number) => void ): Promise { const analysisCanvas = document.createElement("canvas"); const candidates: CandidateFrame[] = []; let lastAcceptedFrame: ImageData | null = null; let lastAcceptedTime: number | null = null; for ( let time = 0; time < video.duration; time += settings.sampleIntervalSeconds ) { await seekVideo(video, time); const imageData = captureAnalysisFrame( video, analysisCanvas, settings.analysisWidth, settings.analysisHeight ); if (!lastAcceptedFrame) { candidates.push({ time, score: 1, reason: "initial-frame", }); lastAcceptedFrame = imageData; lastAcceptedTime = time; continue; } const diffRatio = frameDifferenceRatio( lastAcceptedFrame, imageData, settings.pixelDeltaThreshold ); if ( shouldAcceptCandidate({ diffRatio, currentTime: time, lastAcceptedTime, changedPixelRatioThreshold: settings.changedPixelRatioThreshold, minSecondsBetweenCaptures: settings.minSecondsBetweenCaptures, }) ) { candidates.push({ time, score: diffRatio, reason: "visual-change", }); lastAcceptedFrame = imageData; lastAcceptedTime = time; } onProgress(time / video.duration); if (candidates.length >= settings.maxCandidates) { // optional: stop early or continue and later rank candidates } } return rankAndTrimCandidates(candidates, settings.finalTargetCount); } Candidate type: type CandidateFrame = { time: number; score: number; reason: "initial-frame" | "visual-change" | "manual"; thumbnailUrl?: string; fullBlob?: Blob; }; 8. Candidate ranking Since the desired final count is only 4–10, do not simply take every threshold crossing. Use ranking: Always include first frame if useful. Sort detected changes by score. Enforce time diversity. Keep the top N. Example: function rankAndTrimCandidates( candidates: CandidateFrame[], targetCount: number ): CandidateFrame[] { return [...candidates] .sort((a, b) => b.score - a.score) .slice(0, targetCount) .sort((a, b) => a.time - b.time); } Better version: Divide video into N temporal buckets Pick the strongest candidate from each bucket Then fill remaining slots by highest score This avoids ending up with all images from one active segment. 9. Full-resolution extraction Only render full-resolution images for selected candidates. async function extractFullResolutionFrame( video: HTMLVideoElement, time: number, type: "image/png" | "image/jpeg" | "image/webp" = "image/png", quality?: number ): Promise { await seekVideo(video, time); const canvas = document.createElement("canvas"); canvas.width = video.videoWidth; canvas.height = video.videoHeight; const ctx = canvas.getContext("2d"); if (!ctx) throw new Error("Could not get canvas context"); ctx.drawImage(video, 0, 0); return new Promise((resolve, reject) => { canvas.toBlob( blob => { if (!blob) reject(new Error("Canvas export failed")); else resolve(blob); }, type, quality ); }); } For clipboard, prefer PNG: async function copyImageBlobToClipboard(blob: Blob) { await navigator.clipboard.write([ new ClipboardItem({ [blob.type]: blob, }), ]); } Use: const blob = await extractFullResolutionFrame(video, candidate.time, "image/png"); await copyImageBlobToClipboard(blob); Clipboard image writing should be initiated by a user action, such as clicking a Copy button, and the app should be served from HTTPS or localhost because the Clipboard API is gated behind secure contexts. 10. UI design Main layout ┌──────────────────────────────────────────────┐ │ Video Key Frame Extractor │ ├──────────────────────────────────────────────┤ │ [Select video] filename.mp4 │ │ Duration: 04:32 | Size: 612 MB | 1920×1080 │ ├──────────────────────────────────────────────┤ │ Scan Settings │ │ Sample interval: [0.5s] │ │ Change threshold: [18%] │ │ Min gap: [2.0s] │ │ Target images: [8] │ │ [Scan Video] │ ├──────────────────────────────────────────────┤ │ Progress bar │ ├──────────────────────────────────────────────┤ │ Candidate Gallery │ │ [thumb] 00:04.5 score .24 [Copy] [Download] │ │ [thumb] 00:18.0 score .41 [Copy] [Download] │ │ [thumb] 01:22.5 score .37 [Copy] [Download] │ └──────────────────────────────────────────────┘ Controls Minimum controls: Select video Scan Stop scan Reset Threshold slider Sample interval dropdown Target count Copy image Download image Advanced controls: “Prefer fewer images” “Prefer more images” “Ignore tiny motion” “Include first frame” “Manually add current preview frame” 11. State model Use Zustand or plain React reducer. type AppState = { video: LoadedVideo | null; scanStatus: "idle" | "loading" | "scanning" | "done" | "error"; settings: ScanSettings; candidates: CandidateFrame[]; selectedCandidateId: string | null; progress: number; error: string | null; }; Actions: type AppActions = { loadVideo(file: File): Promise; scanVideo(): Promise; cancelScan(): void; updateSettings(settings: Partial): void; copyCandidate(candidateId: string): Promise; downloadCandidate(candidateId: string): Promise; removeCandidate(candidateId: string): void; }; 12. Worker design For V1, the main thread can own the hidden video element because HTMLVideoElement is DOM-bound. But pixel comparison can be pushed into a worker. Main thread: seek video draw frame to analysis canvas get ImageData send ImageData buffer to worker worker returns diff score Worker message: type AnalyzeFrameMessage = { type: "analyze-frame"; current: ImageData; previous: ImageData; pixelDeltaThreshold: number; }; Worker response: type AnalyzeFrameResult = { type: "frame-score"; diffRatio: number; }; For your target size, this may not be necessary at first. But it is worth structuring the code so the algorithm can move into a worker later. 13. Performance expectations For a 10-minute video: Sample interval: 0.5s Comparisons: ~1200 Analysis resolution: 160×90 Pixels per comparison: 14,400 Total pixel comparisons: ~17.3 million That is very reasonable in browser JavaScript. For a 30-minute video: Sample interval: 0.5s Comparisons: ~3600 Analysis resolution: 160×90 Total pixel comparisons: ~51.8 million Still reasonable. The bottleneck will usually be video seeking/decoding, not the pixel math. 14. File-size considerations Your 100–1000 MB range is practical because: The video is referenced via local file blob URL. You only decode sampled frames. You only store 4–10 output blobs. Analysis frames are tiny. Memory use should look roughly like: Video decoder buffers: browser-managed Analysis frame A: ~160 × 90 × 4 = 57.6 KB Analysis frame B: ~57.6 KB Full-res output image: only when accepted/copied Candidate blobs: 4–10 images At 1920×1080, one raw RGBA canvas frame is about: 1920 × 1080 × 4 = ~8.3 MB At 3840×2160: 3840 × 2160 × 4 = ~33.2 MB Still fine if you only create one full-res canvas at a time. 15. Browser compatibility Best target Start with Chromium-based browsers: Chrome Edge Brave Arc Chromium Reasons: Strong media support. Good Clipboard API support. Good WebCodecs path for later. Firefox should work for the V1 + canvas path. WebCodecs VideoDecoder is listed by MDN as supported in Firefox 130+, while Chromium browsers have supported it since version 94. Safari may work for simple video/canvas extraction, but clipboard image writing and codec behavior are areas to test carefully. 16. Security and privacy model The app should explicitly state: Your video is processed locally in your browser. The file is not uploaded. No server receives the video or extracted images. Implementation requirements: Do not send file contents to analytics. Do not log object URLs remotely. Avoid third-party scripts if privacy is part of the value prop. Serve over HTTPS. Make the app installable as a PWA if desired. 17. Error handling Handle: Case Response Unsupported video codec Show “This browser cannot decode this video.” Huge resolution Offer lower-quality extraction or warn about memory. Clipboard blocked Offer download fallback. Scan too slow Let user increase sample interval. Too many candidates Raise threshold or bucket results. No candidates found Lower threshold or include first/middle/end frames. Clipboard fallback: try { await copyImageBlobToClipboard(blob); } catch { downloadBlob(blob, "frame.png"); } 18. Suggested file structure src/ app/ App.tsx store.ts components/ FilePicker.tsx VideoMetadataPanel.tsx ScanSettingsPanel.tsx ProgressPanel.tsx CandidateGallery.tsx CandidateCard.tsx media/ loadVideo.ts seekVideo.ts captureFrame.ts extractFullFrame.ts clipboard.ts analysis/ frameDifference.ts histogramDifference.ts candidateRanking.ts scanVideo.ts workers/ frameAnalysis.worker.ts types/ media.ts scan.ts 19. MVP implementation plan Phase 1: Core prototype Build: File picker Hidden video loader Metadata display Scan button Basic frame-difference detection Candidate list with timestamps Thumbnail generation No worker yet. Phase 2: Copy/export Add: Full-res frame extraction Copy PNG to clipboard Download PNG fallback Remove candidate Manual add candidate Phase 3: Better detection Add: Histogram comparison Temporal bucketing Threshold presets “Strict / Balanced / Sensitive” modes Example presets: const presets = { strict: { sampleIntervalSeconds: 1.0, changedPixelRatioThreshold: 0.28, minSecondsBetweenCaptures: 4, }, balanced: { sampleIntervalSeconds: 0.5, changedPixelRatioThreshold: 0.18, minSecondsBetweenCaptures: 2, }, sensitive: { sampleIntervalSeconds: 0.25, changedPixelRatioThreshold: 0.1, minSecondsBetweenCaptures: 1, }, }; Phase 4: Worker optimization Move frame comparison into a worker. Phase 5: Advanced media backend Evaluate: Mediabunny + WebCodecs This is the path if you want better control over frame extraction, codecs, and container-level handling. Mediabunny is actively positioned as a modern browser media toolkit, and WebCodecs is the underlying native browser primitive for direct encoded/decoded media handling. 20. Final recommendation For your app, build V1 with: Vite + React + TypeScript HTMLVideoElement Canvas Blob URLs Clipboard API Optional Zustand Do not start with: ffmpeg.wasm OpenCV.js Server-side processing Best V1 strategy: Sample every 0.5s Analyze at 160×90 Compare against last accepted key frame Use threshold + minimum time gap Rank candidates down to 4–10 Extract full-res only when displaying/copying