Recommended approach
Use:
URL.createObjectURL(file) gives the browser a local blob URL for the selected file, and you should release it with URL.revokeObjectURL() when done.
Do not analyze every frame at full resolution
For change detection, use a small analysis canvas:
video frame → draw to 160×90 or 320×180 canvas → compare pixels
Then only render the full-resolution frame when you decide it is worth keeping.
Example strategy:
const ANALYSIS_W = 160; const ANALYSIS_H = 90; const SAMPLE_INTERVAL = 0.5; // seconds const CHANGE_THRESHOLD = 0.18; // tune this
That means a 10-minute video sampled every 0.5 seconds is only:
10 min × 60 / 0.5 = 1200 comparisons
At 160×90, that is cheap.
Simple frame-difference algorithm
Use a downscaled canvas and compare luminance differences:
function frameDifference(a, b) { let changed = 0; const pixels = a.data.length / 4;
for (let i = 0; i < a.data.length; i += 4) { const lumA = 0.299 * a.data[i] + 0.587 * a.data[i + 1] + 0.114 * a.data[i + 2]; const lumB = 0.299 * b.data[i] + 0.587 * b.data[i + 1] + 0.114 * b.data[i + 2];
if (Math.abs(lumA - lumB) > 32) changed++;
}
return changed / pixels; }
Then:
if (diff > CHANGE_THRESHOLD) { // save this timestamp as a candidate key frame } Better heuristic
Instead of comparing every sampled frame to the immediately previous sample, compare it to the last accepted key frame.
That avoids saving several frames during the same transition.
Frame A accepted Frame B slightly different → reject Frame C slightly different → reject Frame D substantially different from A → accept
Also add a minimum time gap:
const MIN_SECONDS_BETWEEN_ACCEPTED = 2.0; Clipboard copying
Copying images to the clipboard is possible with navigator.clipboard.write() and ClipboardItem. Browsers commonly support PNG image data, but clipboard access requires a secure context such as HTTPS or localhost, and browser/user activation rules apply.
Example:
async function copyCanvasToClipboard(canvas) { const blob = await new Promise(resolve => canvas.toBlob(resolve, "image/png") );
await navigator.clipboard.write([ new ClipboardItem({ "image/png": blob }) ]); }
Important: this should happen from a user gesture, like clicking a Copy button.
Memory profile
This is light if implemented correctly:
1 local video blob URL 1 hidden video element 1 small analysis canvas 1 full-res preview canvas/image 4–10 saved output blobs
Avoid:
Storing hundreds/thousands of full-res canvases Storing every sampled frame Using FileReader.readAsDataURL(video) Exporting a huge ZIP unless needed
For your target, keeping 4–10 PNG/WebP blobs in memory is fine.
Practical UX flow
I would build it like this:
User selects video. App scans with adjustable settings: sample interval change threshold minimum seconds between captures max captures App shows candidate thumbnails. User deletes bad picks if needed. Each thumbnail has: Copy PNG Download timestamp Optional: button to rescan with stricter/looser threshold. Caveats Plain seeking may not be perfectly frame-accurate. Sampling every 0.25–1.0s is much more practical than decoding every frame. Browser codec support matters. MP4/H.264 is safest. Safari/Firefox clipboard image support can be more limited than Chromium-based browsers. For exact frame extraction, use WebCodecs or ffmpeg.wasm, but for your “significant scene change” use case, + canvas is probably enough. My recommended V1 stack React/Vite HTMLVideoElement Canvas API Blob/Object URLs navigator.clipboard.write() Optional: localForage or IndexedDB if you want session persistence
No backend required. No upload required.
What would be the best framework or library to handle these tasks? Come up with a detailed implementation / design document.
I’ll treat this as a local-only browser app design, optimized for your 100–1000 MB input range and 4–10 final images. I’ll ground the library/API choices against current browser support rather than assuming older constraints.
Thought for 5s Recommendation
Use React + Vite + TypeScript for the app shell, but do the actual video work with native browser APIs first:
Vite + React + TypeScript HTMLVideoElement Canvas / OffscreenCanvas Web Worker Blob URLs Clipboard API
For your use case, I would not start with ffmpeg.wasm. It is powerful, but it is overkill for scene/key-frame extraction and can become memory-heavy with 100–1000 MB files. ffmpeg.wasm is a browser/WebAssembly port of FFmpeg, but many real-world reports and project issues revolve around large-file memory limits and browser stability.
For a stronger V2, consider Mediabunny + WebCodecs. Mediabunny is a TypeScript media toolkit designed for browser-side reading/writing/conversion, and WebCodecs exposes native VideoFrame, EncodedVideoChunk, VideoDecoder, and VideoEncoder primitives.
Design Document: Local Video Key-Frame Extractor
Build a browser-only app that allows a user to:
Select a local video file. Analyze it without uploading to a server. Detect frames where the image content has significantly changed. Present a small set of candidate key images. Let the user copy selected images to the clipboard.
Target input:
Video size: 100–1000 MB Expected output: 4–10 still images Processing model: local-only Backend: none required 2. Recommended architecture User selects video ↓ Create local Blob URL ↓ Hidden element loads local file ↓ Sampling engine seeks through video ↓ Downscaled analysis canvas compares frames ↓ Candidate detector stores timestamps ↓ Full-resolution renderer extracts selected frames ↓ Gallery shows thumbnails ↓ User copies PNG to clipboard
Core browser APIs:
Task API Local file selection File API Local video loading URL.createObjectURL() Video decoding/seeking HTMLVideoElement Pixel comparison Canvas API Off-main-thread analysis Worker + OffscreenCanvas where available Image export canvas.toBlob() Clipboard copy navigator.clipboard.write() + ClipboardItem
URL.createObjectURL() is appropriate here because it creates a local blob URL for a File/Blob, avoiding the need to load the entire file into JavaScript memory. The Clipboard API supports writing non-text data through ClipboardItem, but it requires a secure context such as HTTPS or localhost.
Why:
Vite: fast local dev, easy static deployment. React: good for gallery, controls, progress UI, and stateful review flow. TypeScript: useful because media processing code quickly accumulates edge cases. No backend: app can deploy as static files. No heavy media dependency initially: browser decoding is enough for threshold-based change detection.
Suggested install:
npm create vite@latest video-keyframe-extractor -- --template react-ts cd video-keyframe-extractor npm install npm run dev
Optional dependencies:
npm install zustand
I would avoid extra UI frameworks unless you want polished controls quickly.
Responsibilities:
Accept local video file. Validate file type and size. Create blob URL. Load video metadata. Display duration, resolution, codec/browser compatibility status if possible.
Example state:
type LoadedVideo = { file: File; objectUrl: string; duration: number; width: number; height: number; };
Important cleanup:
URL.revokeObjectURL(objectUrl);
Never use this for the video:
FileReader.readAsDataURL(file);
That would load and base64-expand the whole video in memory.
5.2 Scan settings module
Expose these controls:
type ScanSettings = { sampleIntervalSeconds: number; // default: 0.5 analysisWidth: number; // default: 160 analysisHeight: number; // default: 90 pixelDeltaThreshold: number; // default: 32 changedPixelRatioThreshold: number; // default: 0.18 minSecondsBetweenCaptures: number; // default: 2.0 maxCandidates: number; // default: 20 finalTargetCount: number; // default: 8 };
Suggested defaults:
const defaultSettings: ScanSettings = { sampleIntervalSeconds: 0.5, analysisWidth: 160, analysisHeight: 90, pixelDeltaThreshold: 32, changedPixelRatioThreshold: 0.18, minSecondsBetweenCaptures: 2.0, maxCandidates: 20, finalTargetCount: 8, }; 5.3 Frame sampling module
V1 should seek using a hidden video element:
async function seekVideo(video: HTMLVideoElement, time: number): Promise { return new Promise((resolve, reject) => { const onSeeked = () => { cleanup(); resolve(); };
const onError = () => { cleanup(); reject(video.error); }; const cleanup = () => { video.removeEventListener("seeked", onSeeked); video.removeEventListener("error", onError); }; video.addEventListener("seeked", onSeeked, { once: true }); video.addEventListener("error", onError, { once: true }); video.currentTime = Math.min(time, video.duration);
}); }
Then draw into a small analysis canvas:
function captureAnalysisFrame( video: HTMLVideoElement, canvas: HTMLCanvasElement, width: number, height: number ): ImageData { canvas.width = width; canvas.height = height;
const ctx = canvas.getContext("2d", { willReadFrequently: true }); if (!ctx) throw new Error("Could not get canvas context");
ctx.drawImage(video, 0, 0, width, height); return ctx.getImageData(0, 0, width, height); } 6. Change detection algorithm 6.1 Simple and effective V1
Compare current sampled frame against the last accepted key frame, not merely the immediately previous sample.
This prevents over-detecting long transitions.
function frameDifferenceRatio( a: ImageData, b: ImageData, pixelDeltaThreshold: number ): number { const dataA = a.data; const dataB = b.data; let changed = 0; const pixels = dataA.length / 4;
for (let i = 0; i < dataA.length; i += 4) { const lumA = 0.299 * dataA[i] + 0.587 * dataA[i + 1] + 0.114 * dataA[i + 2];
const lumB = 0.299 * dataB[i] + 0.587 * dataB[i + 1] + 0.114 * dataB[i + 2]; if (Math.abs(lumA - lumB) > pixelDeltaThreshold) { changed++; }
Acceptance logic:
function shouldAcceptCandidate(params: { diffRatio: number; currentTime: number; lastAcceptedTime: number | null; changedPixelRatioThreshold: number; minSecondsBetweenCaptures: number; }) { const { diffRatio, currentTime, lastAcceptedTime, changedPixelRatioThreshold, minSecondsBetweenCaptures, } = params;
if (diffRatio < changedPixelRatioThreshold) return false;
if ( lastAcceptedTime !== null && currentTime - lastAcceptedTime < minSecondsBetweenCaptures ) { return false; }
return true; } 6.2 Better V1.5: histogram difference
Raw pixel comparison can be sensitive to motion, noise, flashes, compression artifacts, and camera pans.
A better approach is to compare low-resolution color/luma histograms.
Recommended hybrid score:
score = 0.7 × luminance histogram difference + 0.3 × pixel difference ratio
This tends to better detect scene/content changes instead of tiny motion.
6.3 Optional V2: perceptual hash
For more stable detection:
Generate a small grayscale frame, e.g. 32×32. Compute average hash or difference hash. Compare Hamming distance. Accept when distance exceeds threshold.
This is good when videos have subtle compression differences but meaningful visual changes.
Pseudo-code:
async function scanVideo( video: HTMLVideoElement, settings: ScanSettings, onProgress: (progress: number) => void ): Promise<CandidateFrame[]> { const analysisCanvas = document.createElement("canvas");
const candidates: CandidateFrame[] = []; let lastAcceptedFrame: ImageData | null = null; let lastAcceptedTime: number | null = null;
for ( let time = 0; time < video.duration; time += settings.sampleIntervalSeconds ) { await seekVideo(video, time);
const imageData = captureAnalysisFrame( video, analysisCanvas, settings.analysisWidth, settings.analysisHeight ); if (!lastAcceptedFrame) { candidates.push({ time, score: 1, reason: "initial-frame", }); lastAcceptedFrame = imageData; lastAcceptedTime = time; continue; } const diffRatio = frameDifferenceRatio( lastAcceptedFrame, imageData, settings.pixelDeltaThreshold ); if ( shouldAcceptCandidate({ diffRatio, currentTime: time, lastAcceptedTime, changedPixelRatioThreshold: settings.changedPixelRatioThreshold, minSecondsBetweenCaptures: settings.minSecondsBetweenCaptures, }) ) { candidates.push({ time, score: diffRatio, reason: "visual-change", }); lastAcceptedFrame = imageData; lastAcceptedTime = time; } onProgress(time / video.duration); if (candidates.length >= settings.maxCandidates) { // optional: stop early or continue and later rank candidates }
return rankAndTrimCandidates(candidates, settings.finalTargetCount); }
Candidate type:
type CandidateFrame = { time: number; score: number; reason: "initial-frame" | "visual-change" | "manual"; thumbnailUrl?: string; fullBlob?: Blob; }; 8. Candidate ranking
Since the desired final count is only 4–10, do not simply take every threshold crossing.
Use ranking:
Always include first frame if useful. Sort detected changes by score. Enforce time diversity. Keep the top N.
function rankAndTrimCandidates( candidates: CandidateFrame[], targetCount: number ): CandidateFrame[] { return [...candidates] .sort((a, b) => b.score - a.score) .slice(0, targetCount) .sort((a, b) => a.time - b.time); }
Better version:
Divide video into N temporal buckets Pick the strongest candidate from each bucket Then fill remaining slots by highest score
This avoids ending up with all images from one active segment.
Only render full-resolution images for selected candidates.
async function extractFullResolutionFrame( video: HTMLVideoElement, time: number, type: "image/png" | "image/jpeg" | "image/webp" = "image/png", quality?: number ): Promise { await seekVideo(video, time);
const canvas = document.createElement("canvas"); canvas.width = video.videoWidth; canvas.height = video.videoHeight;
const ctx = canvas.getContext("2d"); if (!ctx) throw new Error("Could not get canvas context");
ctx.drawImage(video, 0, 0);
return new Promise((resolve, reject) => { canvas.toBlob( blob => { if (!blob) reject(new Error("Canvas export failed")); else resolve(blob); }, type, quality ); }); }
For clipboard, prefer PNG:
async function copyImageBlobToClipboard(blob: Blob) { await navigator.clipboard.write([ new ClipboardItem({ [blob.type]: blob, }), ]); }
const blob = await extractFullResolutionFrame(video, candidate.time, "image/png"); await copyImageBlobToClipboard(blob);
Clipboard image writing should be initiated by a user action, such as clicking a Copy button, and the app should be served from HTTPS or localhost because the Clipboard API is gated behind secure contexts.
Minimum controls:
Select video Scan Stop scan Reset Threshold slider Sample interval dropdown Target count Copy image Download image
Advanced controls:
“Prefer fewer images” “Prefer more images” “Ignore tiny motion” “Include first frame” “Manually add current preview frame” 11. State model
Use Zustand or plain React reducer.
type AppState = { video: LoadedVideo | null; scanStatus: "idle" | "loading" | "scanning" | "done" | "error"; settings: ScanSettings; candidates: CandidateFrame[]; selectedCandidateId: string | null; progress: number; error: string | null; };
Actions:
type AppActions = { loadVideo(file: File): Promise; scanVideo(): Promise; cancelScan(): void; updateSettings(settings: Partial): void; copyCandidate(candidateId: string): Promise; downloadCandidate(candidateId: string): Promise; removeCandidate(candidateId: string): void; }; 12. Worker design
For V1, the main thread can own the hidden video element because HTMLVideoElement is DOM-bound. But pixel comparison can be pushed into a worker.
Main thread:
seek video draw frame to analysis canvas get ImageData send ImageData buffer to worker worker returns diff score
Worker message:
type AnalyzeFrameMessage = { type: "analyze-frame"; current: ImageData; previous: ImageData; pixelDeltaThreshold: number; };
Worker response:
type AnalyzeFrameResult = { type: "frame-score"; diffRatio: number; };
For your target size, this may not be necessary at first. But it is worth structuring the code so the algorithm can move into a worker later.
For a 10-minute video:
Sample interval: 0.5s Comparisons: ~1200 Analysis resolution: 160×90 Pixels per comparison: 14,400 Total pixel comparisons: ~17.3 million
That is very reasonable in browser JavaScript.
For a 30-minute video:
Sample interval: 0.5s Comparisons: ~3600 Analysis resolution: 160×90 Total pixel comparisons: ~51.8 million
Still reasonable. The bottleneck will usually be video seeking/decoding, not the pixel math.
Your 100–1000 MB range is practical because:
The video is referenced via local file blob URL. You only decode sampled frames. You only store 4–10 output blobs. Analysis frames are tiny.
Memory use should look roughly like:
Video decoder buffers: browser-managed Analysis frame A: ~160 × 90 × 4 = 57.6 KB Analysis frame B: ~57.6 KB Full-res output image: only when accepted/copied Candidate blobs: 4–10 images
At 1920×1080, one raw RGBA canvas frame is about:
1920 × 1080 × 4 = ~8.3 MB
At 3840×2160:
3840 × 2160 × 4 = ~33.2 MB
Still fine if you only create one full-res canvas at a time.
Start with Chromium-based browsers:
Chrome Edge Brave Arc Chromium
Reasons:
Strong media support. Good Clipboard API support. Good WebCodecs path for later.
Firefox should work for the V1 + canvas path. WebCodecs VideoDecoder is listed by MDN as supported in Firefox 130+, while Chromium browsers have supported it since version 94.
Safari may work for simple video/canvas extraction, but clipboard image writing and codec behavior are areas to test carefully.
The app should explicitly state:
Your video is processed locally in your browser. The file is not uploaded. No server receives the video or extracted images.
Implementation requirements:
Do not send file contents to analytics. Do not log object URLs remotely. Avoid third-party scripts if privacy is part of the value prop. Serve over HTTPS. Make the app installable as a PWA if desired. 17. Error handling
Handle:
Case Response Unsupported video codec Show “This browser cannot decode this video.” Huge resolution Offer lower-quality extraction or warn about memory. Clipboard blocked Offer download fallback. Scan too slow Let user increase sample interval. Too many candidates Raise threshold or bucket results. No candidates found Lower threshold or include first/middle/end frames.
Clipboard fallback:
try { await copyImageBlobToClipboard(blob); } catch { downloadBlob(blob, "frame.png"); } 18. Suggested file structure src/ app/ App.tsx store.ts components/ FilePicker.tsx VideoMetadataPanel.tsx ScanSettingsPanel.tsx ProgressPanel.tsx CandidateGallery.tsx CandidateCard.tsx media/ loadVideo.ts seekVideo.ts captureFrame.ts extractFullFrame.ts clipboard.ts analysis/ frameDifference.ts histogramDifference.ts candidateRanking.ts scanVideo.ts workers/ frameAnalysis.worker.ts types/ media.ts scan.ts 19. MVP implementation plan Phase 1: Core prototype
Build:
File picker Hidden video loader Metadata display Scan button Basic frame-difference detection Candidate list with timestamps Thumbnail generation
No worker yet.
Phase 2: Copy/export
Add:
Full-res frame extraction Copy PNG to clipboard Download PNG fallback Remove candidate Manual add candidate Phase 3: Better detection
Histogram comparison Temporal bucketing Threshold presets “Strict / Balanced / Sensitive” modes
Example presets:
const presets = { strict: { sampleIntervalSeconds: 1.0, changedPixelRatioThreshold: 0.28, minSecondsBetweenCaptures: 4, }, balanced: { sampleIntervalSeconds: 0.5, changedPixelRatioThreshold: 0.18, minSecondsBetweenCaptures: 2, }, sensitive: { sampleIntervalSeconds: 0.25, changedPixelRatioThreshold: 0.1, minSecondsBetweenCaptures: 1, }, }; Phase 4: Worker optimization
Move frame comparison into a worker.
Phase 5: Advanced media backend
Evaluate:
Mediabunny + WebCodecs
This is the path if you want better control over frame extraction, codecs, and container-level handling. Mediabunny is actively positioned as a modern browser media toolkit, and WebCodecs is the underlying native browser primitive for direct encoded/decoded media handling.
For your app, build V1 with:
Vite + React + TypeScript HTMLVideoElement Canvas Blob URLs Clipboard API Optional Zustand
Do not start with:
ffmpeg.wasm OpenCV.js Server-side processing
Best V1 strategy:
Sample every 0.5s Analyze at 160×90 Compare against last accepted key frame Use threshold + minimum time gap Rank candidates down to 4–10 Extract full-res only when displaying/copying