docs: add design document for local video key-frame extractor
This commit is contained in:
@@ -0,0 +1,929 @@
|
|||||||
|
Recommended approach
|
||||||
|
|
||||||
|
Use:
|
||||||
|
|
||||||
|
<input type="file">
|
||||||
|
→ URL.createObjectURL(file)
|
||||||
|
→ hidden <video>
|
||||||
|
→ sample frames periodically
|
||||||
|
→ draw sampled frame to canvas
|
||||||
|
→ compare against previous accepted frame
|
||||||
|
→ keep only significant-change frames
|
||||||
|
→ user clicks Copy on selected image
|
||||||
|
→ navigator.clipboard.write()
|
||||||
|
|
||||||
|
URL.createObjectURL(file) gives the browser a local blob URL for the selected file, and you should release it with URL.revokeObjectURL() when done.
|
||||||
|
|
||||||
|
Do not analyze every frame at full resolution
|
||||||
|
|
||||||
|
For change detection, use a small analysis canvas:
|
||||||
|
|
||||||
|
video frame → draw to 160×90 or 320×180 canvas → compare pixels
|
||||||
|
|
||||||
|
Then only render the full-resolution frame when you decide it is worth keeping.
|
||||||
|
|
||||||
|
Example strategy:
|
||||||
|
|
||||||
|
const ANALYSIS_W = 160;
|
||||||
|
const ANALYSIS_H = 90;
|
||||||
|
const SAMPLE_INTERVAL = 0.5; // seconds
|
||||||
|
const CHANGE_THRESHOLD = 0.18; // tune this
|
||||||
|
|
||||||
|
That means a 10-minute video sampled every 0.5 seconds is only:
|
||||||
|
|
||||||
|
10 min × 60 / 0.5 = 1200 comparisons
|
||||||
|
|
||||||
|
At 160×90, that is cheap.
|
||||||
|
|
||||||
|
Simple frame-difference algorithm
|
||||||
|
|
||||||
|
Use a downscaled canvas and compare luminance differences:
|
||||||
|
|
||||||
|
function frameDifference(a, b) {
|
||||||
|
let changed = 0;
|
||||||
|
const pixels = a.data.length / 4;
|
||||||
|
|
||||||
|
for (let i = 0; i < a.data.length; i += 4) {
|
||||||
|
const lumA = 0.299 * a.data[i] + 0.587 * a.data[i + 1] + 0.114 * a.data[i + 2];
|
||||||
|
const lumB = 0.299 * b.data[i] + 0.587 * b.data[i + 1] + 0.114 * b.data[i + 2];
|
||||||
|
|
||||||
|
if (Math.abs(lumA - lumB) > 32) changed++;
|
||||||
|
}
|
||||||
|
|
||||||
|
return changed / pixels;
|
||||||
|
}
|
||||||
|
|
||||||
|
Then:
|
||||||
|
|
||||||
|
if (diff > CHANGE_THRESHOLD) {
|
||||||
|
// save this timestamp as a candidate key frame
|
||||||
|
}
|
||||||
|
Better heuristic
|
||||||
|
|
||||||
|
Instead of comparing every sampled frame to the immediately previous sample, compare it to the last accepted key frame.
|
||||||
|
|
||||||
|
That avoids saving several frames during the same transition.
|
||||||
|
|
||||||
|
Frame A accepted
|
||||||
|
Frame B slightly different → reject
|
||||||
|
Frame C slightly different → reject
|
||||||
|
Frame D substantially different from A → accept
|
||||||
|
|
||||||
|
Also add a minimum time gap:
|
||||||
|
|
||||||
|
const MIN_SECONDS_BETWEEN_ACCEPTED = 2.0;
|
||||||
|
Clipboard copying
|
||||||
|
|
||||||
|
Copying images to the clipboard is possible with navigator.clipboard.write() and ClipboardItem. Browsers commonly support PNG image data, but clipboard access requires a secure context such as HTTPS or localhost, and browser/user activation rules apply.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
async function copyCanvasToClipboard(canvas) {
|
||||||
|
const blob = await new Promise(resolve =>
|
||||||
|
canvas.toBlob(resolve, "image/png")
|
||||||
|
);
|
||||||
|
|
||||||
|
await navigator.clipboard.write([
|
||||||
|
new ClipboardItem({
|
||||||
|
"image/png": blob
|
||||||
|
})
|
||||||
|
]);
|
||||||
|
}
|
||||||
|
|
||||||
|
Important: this should happen from a user gesture, like clicking a Copy button.
|
||||||
|
|
||||||
|
Memory profile
|
||||||
|
|
||||||
|
This is light if implemented correctly:
|
||||||
|
|
||||||
|
1 local video blob URL
|
||||||
|
1 hidden video element
|
||||||
|
1 small analysis canvas
|
||||||
|
1 full-res preview canvas/image
|
||||||
|
4–10 saved output blobs
|
||||||
|
|
||||||
|
Avoid:
|
||||||
|
|
||||||
|
Storing hundreds/thousands of full-res canvases
|
||||||
|
Storing every sampled frame
|
||||||
|
Using FileReader.readAsDataURL(video)
|
||||||
|
Exporting a huge ZIP unless needed
|
||||||
|
|
||||||
|
For your target, keeping 4–10 PNG/WebP blobs in memory is fine.
|
||||||
|
|
||||||
|
Practical UX flow
|
||||||
|
|
||||||
|
I would build it like this:
|
||||||
|
|
||||||
|
User selects video.
|
||||||
|
App scans with adjustable settings:
|
||||||
|
sample interval
|
||||||
|
change threshold
|
||||||
|
minimum seconds between captures
|
||||||
|
max captures
|
||||||
|
App shows candidate thumbnails.
|
||||||
|
User deletes bad picks if needed.
|
||||||
|
Each thumbnail has:
|
||||||
|
Copy PNG
|
||||||
|
Download
|
||||||
|
timestamp
|
||||||
|
Optional: button to rescan with stricter/looser threshold.
|
||||||
|
Caveats
|
||||||
|
Plain <video> seeking may not be perfectly frame-accurate.
|
||||||
|
Sampling every 0.25–1.0s is much more practical than decoding every frame.
|
||||||
|
Browser codec support matters. MP4/H.264 is safest.
|
||||||
|
Safari/Firefox clipboard image support can be more limited than Chromium-based browsers.
|
||||||
|
For exact frame extraction, use WebCodecs or ffmpeg.wasm, but for your “significant scene change” use case, <video> + canvas is probably enough.
|
||||||
|
My recommended V1 stack
|
||||||
|
React/Vite
|
||||||
|
HTMLVideoElement
|
||||||
|
Canvas API
|
||||||
|
Blob/Object URLs
|
||||||
|
navigator.clipboard.write()
|
||||||
|
Optional: localForage or IndexedDB if you want session persistence
|
||||||
|
|
||||||
|
No backend required. No upload required.
|
||||||
|
|
||||||
|
What would be the best framework or library to handle these tasks? Come up with a detailed implementation / design document.
|
||||||
|
|
||||||
|
I’ll treat this as a local-only browser app design, optimized for your 100–1000 MB input range and 4–10 final images. I’ll ground the library/API choices against current browser support rather than assuming older constraints.
|
||||||
|
|
||||||
|
Thought for 5s
|
||||||
|
Recommendation
|
||||||
|
|
||||||
|
Use React + Vite + TypeScript for the app shell, but do the actual video work with native browser APIs first:
|
||||||
|
|
||||||
|
Vite + React + TypeScript
|
||||||
|
HTMLVideoElement
|
||||||
|
Canvas / OffscreenCanvas
|
||||||
|
Web Worker
|
||||||
|
Blob URLs
|
||||||
|
Clipboard API
|
||||||
|
|
||||||
|
For your use case, I would not start with ffmpeg.wasm. It is powerful, but it is overkill for scene/key-frame extraction and can become memory-heavy with 100–1000 MB files. ffmpeg.wasm is a browser/WebAssembly port of FFmpeg, but many real-world reports and project issues revolve around large-file memory limits and browser stability.
|
||||||
|
|
||||||
|
For a stronger V2, consider Mediabunny + WebCodecs. Mediabunny is a TypeScript media toolkit designed for browser-side reading/writing/conversion, and WebCodecs exposes native VideoFrame, EncodedVideoChunk, VideoDecoder, and VideoEncoder primitives.
|
||||||
|
|
||||||
|
Design Document: Local Video Key-Frame Extractor
|
||||||
|
1. Goal
|
||||||
|
|
||||||
|
Build a browser-only app that allows a user to:
|
||||||
|
|
||||||
|
Select a local video file.
|
||||||
|
Analyze it without uploading to a server.
|
||||||
|
Detect frames where the image content has significantly changed.
|
||||||
|
Present a small set of candidate key images.
|
||||||
|
Let the user copy selected images to the clipboard.
|
||||||
|
|
||||||
|
Target input:
|
||||||
|
|
||||||
|
Video size: 100–1000 MB
|
||||||
|
Expected output: 4–10 still images
|
||||||
|
Processing model: local-only
|
||||||
|
Backend: none required
|
||||||
|
2. Recommended architecture
|
||||||
|
User selects video
|
||||||
|
↓
|
||||||
|
Create local Blob URL
|
||||||
|
↓
|
||||||
|
Hidden <video> element loads local file
|
||||||
|
↓
|
||||||
|
Sampling engine seeks through video
|
||||||
|
↓
|
||||||
|
Downscaled analysis canvas compares frames
|
||||||
|
↓
|
||||||
|
Candidate detector stores timestamps
|
||||||
|
↓
|
||||||
|
Full-resolution renderer extracts selected frames
|
||||||
|
↓
|
||||||
|
Gallery shows thumbnails
|
||||||
|
↓
|
||||||
|
User copies PNG to clipboard
|
||||||
|
|
||||||
|
Core browser APIs:
|
||||||
|
|
||||||
|
Task API
|
||||||
|
Local file selection File API
|
||||||
|
Local video loading URL.createObjectURL()
|
||||||
|
Video decoding/seeking HTMLVideoElement
|
||||||
|
Pixel comparison Canvas API
|
||||||
|
Off-main-thread analysis Worker + OffscreenCanvas where available
|
||||||
|
Image export canvas.toBlob()
|
||||||
|
Clipboard copy navigator.clipboard.write() + ClipboardItem
|
||||||
|
|
||||||
|
URL.createObjectURL() is appropriate here because it creates a local blob URL for a File/Blob, avoiding the need to load the entire file into JavaScript memory. The Clipboard API supports writing non-text data through ClipboardItem, but it requires a secure context such as HTTPS or localhost.
|
||||||
|
|
||||||
|
3. Framework choice
|
||||||
|
Best V1 stack
|
||||||
|
Vite
|
||||||
|
React
|
||||||
|
TypeScript
|
||||||
|
Zustand or simple React state
|
||||||
|
Canvas API
|
||||||
|
Web Worker
|
||||||
|
|
||||||
|
Why:
|
||||||
|
|
||||||
|
Vite: fast local dev, easy static deployment.
|
||||||
|
React: good for gallery, controls, progress UI, and stateful review flow.
|
||||||
|
TypeScript: useful because media processing code quickly accumulates edge cases.
|
||||||
|
No backend: app can deploy as static files.
|
||||||
|
No heavy media dependency initially: browser decoding is enough for threshold-based change detection.
|
||||||
|
|
||||||
|
Suggested install:
|
||||||
|
|
||||||
|
npm create vite@latest video-keyframe-extractor -- --template react-ts
|
||||||
|
cd video-keyframe-extractor
|
||||||
|
npm install
|
||||||
|
npm run dev
|
||||||
|
|
||||||
|
Optional dependencies:
|
||||||
|
|
||||||
|
npm install zustand
|
||||||
|
|
||||||
|
I would avoid extra UI frameworks unless you want polished controls quickly.
|
||||||
|
|
||||||
|
4. Library/API decision matrix
|
||||||
|
Option Use for V1? Notes
|
||||||
|
Native <video> + canvas Yes Best starting point. Simple, no server, no large WASM payload.
|
||||||
|
Web Worker Yes Keeps UI responsive during scans.
|
||||||
|
OffscreenCanvas Yes, optional Good for worker-side pixel analysis where supported.
|
||||||
|
WebCodecs V2 Better low-level control, but more complex. WebCodecs is browser-native and provides direct access to decoded frames.
|
||||||
|
Mediabunny V2 / advanced Good candidate for more robust container/media handling. It advertises efficient browser-side reading and only loading what is needed.
|
||||||
|
ffmpeg.wasm Avoid for V1 Heavy and memory-risky for 100–1000 MB videos. Better for fallback/export/transcoding, not first-pass scene detection.
|
||||||
|
OpenCV.js Probably avoid Useful for advanced CV, but heavy for simple scene-difference detection.
|
||||||
|
5. App modules
|
||||||
|
5.1 File intake module
|
||||||
|
|
||||||
|
Responsibilities:
|
||||||
|
|
||||||
|
Accept local video file.
|
||||||
|
Validate file type and size.
|
||||||
|
Create blob URL.
|
||||||
|
Load video metadata.
|
||||||
|
Display duration, resolution, codec/browser compatibility status if possible.
|
||||||
|
|
||||||
|
Example state:
|
||||||
|
|
||||||
|
type LoadedVideo = {
|
||||||
|
file: File;
|
||||||
|
objectUrl: string;
|
||||||
|
duration: number;
|
||||||
|
width: number;
|
||||||
|
height: number;
|
||||||
|
};
|
||||||
|
|
||||||
|
Important cleanup:
|
||||||
|
|
||||||
|
URL.revokeObjectURL(objectUrl);
|
||||||
|
|
||||||
|
Never use this for the video:
|
||||||
|
|
||||||
|
FileReader.readAsDataURL(file);
|
||||||
|
|
||||||
|
That would load and base64-expand the whole video in memory.
|
||||||
|
|
||||||
|
5.2 Scan settings module
|
||||||
|
|
||||||
|
Expose these controls:
|
||||||
|
|
||||||
|
type ScanSettings = {
|
||||||
|
sampleIntervalSeconds: number; // default: 0.5
|
||||||
|
analysisWidth: number; // default: 160
|
||||||
|
analysisHeight: number; // default: 90
|
||||||
|
pixelDeltaThreshold: number; // default: 32
|
||||||
|
changedPixelRatioThreshold: number; // default: 0.18
|
||||||
|
minSecondsBetweenCaptures: number; // default: 2.0
|
||||||
|
maxCandidates: number; // default: 20
|
||||||
|
finalTargetCount: number; // default: 8
|
||||||
|
};
|
||||||
|
|
||||||
|
Suggested defaults:
|
||||||
|
|
||||||
|
const defaultSettings: ScanSettings = {
|
||||||
|
sampleIntervalSeconds: 0.5,
|
||||||
|
analysisWidth: 160,
|
||||||
|
analysisHeight: 90,
|
||||||
|
pixelDeltaThreshold: 32,
|
||||||
|
changedPixelRatioThreshold: 0.18,
|
||||||
|
minSecondsBetweenCaptures: 2.0,
|
||||||
|
maxCandidates: 20,
|
||||||
|
finalTargetCount: 8,
|
||||||
|
};
|
||||||
|
5.3 Frame sampling module
|
||||||
|
|
||||||
|
V1 should seek using a hidden video element:
|
||||||
|
|
||||||
|
async function seekVideo(video: HTMLVideoElement, time: number): Promise<void> {
|
||||||
|
return new Promise((resolve, reject) => {
|
||||||
|
const onSeeked = () => {
|
||||||
|
cleanup();
|
||||||
|
resolve();
|
||||||
|
};
|
||||||
|
|
||||||
|
const onError = () => {
|
||||||
|
cleanup();
|
||||||
|
reject(video.error);
|
||||||
|
};
|
||||||
|
|
||||||
|
const cleanup = () => {
|
||||||
|
video.removeEventListener("seeked", onSeeked);
|
||||||
|
video.removeEventListener("error", onError);
|
||||||
|
};
|
||||||
|
|
||||||
|
video.addEventListener("seeked", onSeeked, { once: true });
|
||||||
|
video.addEventListener("error", onError, { once: true });
|
||||||
|
|
||||||
|
video.currentTime = Math.min(time, video.duration);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
Then draw into a small analysis canvas:
|
||||||
|
|
||||||
|
function captureAnalysisFrame(
|
||||||
|
video: HTMLVideoElement,
|
||||||
|
canvas: HTMLCanvasElement,
|
||||||
|
width: number,
|
||||||
|
height: number
|
||||||
|
): ImageData {
|
||||||
|
canvas.width = width;
|
||||||
|
canvas.height = height;
|
||||||
|
|
||||||
|
const ctx = canvas.getContext("2d", { willReadFrequently: true });
|
||||||
|
if (!ctx) throw new Error("Could not get canvas context");
|
||||||
|
|
||||||
|
ctx.drawImage(video, 0, 0, width, height);
|
||||||
|
return ctx.getImageData(0, 0, width, height);
|
||||||
|
}
|
||||||
|
6. Change detection algorithm
|
||||||
|
6.1 Simple and effective V1
|
||||||
|
|
||||||
|
Compare current sampled frame against the last accepted key frame, not merely the immediately previous sample.
|
||||||
|
|
||||||
|
This prevents over-detecting long transitions.
|
||||||
|
|
||||||
|
function frameDifferenceRatio(
|
||||||
|
a: ImageData,
|
||||||
|
b: ImageData,
|
||||||
|
pixelDeltaThreshold: number
|
||||||
|
): number {
|
||||||
|
const dataA = a.data;
|
||||||
|
const dataB = b.data;
|
||||||
|
let changed = 0;
|
||||||
|
const pixels = dataA.length / 4;
|
||||||
|
|
||||||
|
for (let i = 0; i < dataA.length; i += 4) {
|
||||||
|
const lumA =
|
||||||
|
0.299 * dataA[i] +
|
||||||
|
0.587 * dataA[i + 1] +
|
||||||
|
0.114 * dataA[i + 2];
|
||||||
|
|
||||||
|
const lumB =
|
||||||
|
0.299 * dataB[i] +
|
||||||
|
0.587 * dataB[i + 1] +
|
||||||
|
0.114 * dataB[i + 2];
|
||||||
|
|
||||||
|
if (Math.abs(lumA - lumB) > pixelDeltaThreshold) {
|
||||||
|
changed++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return changed / pixels;
|
||||||
|
}
|
||||||
|
|
||||||
|
Acceptance logic:
|
||||||
|
|
||||||
|
function shouldAcceptCandidate(params: {
|
||||||
|
diffRatio: number;
|
||||||
|
currentTime: number;
|
||||||
|
lastAcceptedTime: number | null;
|
||||||
|
changedPixelRatioThreshold: number;
|
||||||
|
minSecondsBetweenCaptures: number;
|
||||||
|
}) {
|
||||||
|
const {
|
||||||
|
diffRatio,
|
||||||
|
currentTime,
|
||||||
|
lastAcceptedTime,
|
||||||
|
changedPixelRatioThreshold,
|
||||||
|
minSecondsBetweenCaptures,
|
||||||
|
} = params;
|
||||||
|
|
||||||
|
if (diffRatio < changedPixelRatioThreshold) return false;
|
||||||
|
|
||||||
|
if (
|
||||||
|
lastAcceptedTime !== null &&
|
||||||
|
currentTime - lastAcceptedTime < minSecondsBetweenCaptures
|
||||||
|
) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
6.2 Better V1.5: histogram difference
|
||||||
|
|
||||||
|
Raw pixel comparison can be sensitive to motion, noise, flashes, compression artifacts, and camera pans.
|
||||||
|
|
||||||
|
A better approach is to compare low-resolution color/luma histograms.
|
||||||
|
|
||||||
|
Recommended hybrid score:
|
||||||
|
|
||||||
|
score = 0.7 × luminance histogram difference
|
||||||
|
+ 0.3 × pixel difference ratio
|
||||||
|
|
||||||
|
This tends to better detect scene/content changes instead of tiny motion.
|
||||||
|
|
||||||
|
6.3 Optional V2: perceptual hash
|
||||||
|
|
||||||
|
For more stable detection:
|
||||||
|
|
||||||
|
Generate a small grayscale frame, e.g. 32×32.
|
||||||
|
Compute average hash or difference hash.
|
||||||
|
Compare Hamming distance.
|
||||||
|
Accept when distance exceeds threshold.
|
||||||
|
|
||||||
|
This is good when videos have subtle compression differences but meaningful visual changes.
|
||||||
|
|
||||||
|
7. Scan workflow
|
||||||
|
|
||||||
|
Pseudo-code:
|
||||||
|
|
||||||
|
async function scanVideo(
|
||||||
|
video: HTMLVideoElement,
|
||||||
|
settings: ScanSettings,
|
||||||
|
onProgress: (progress: number) => void
|
||||||
|
): Promise<CandidateFrame[]> {
|
||||||
|
const analysisCanvas = document.createElement("canvas");
|
||||||
|
|
||||||
|
const candidates: CandidateFrame[] = [];
|
||||||
|
let lastAcceptedFrame: ImageData | null = null;
|
||||||
|
let lastAcceptedTime: number | null = null;
|
||||||
|
|
||||||
|
for (
|
||||||
|
let time = 0;
|
||||||
|
time < video.duration;
|
||||||
|
time += settings.sampleIntervalSeconds
|
||||||
|
) {
|
||||||
|
await seekVideo(video, time);
|
||||||
|
|
||||||
|
const imageData = captureAnalysisFrame(
|
||||||
|
video,
|
||||||
|
analysisCanvas,
|
||||||
|
settings.analysisWidth,
|
||||||
|
settings.analysisHeight
|
||||||
|
);
|
||||||
|
|
||||||
|
if (!lastAcceptedFrame) {
|
||||||
|
candidates.push({
|
||||||
|
time,
|
||||||
|
score: 1,
|
||||||
|
reason: "initial-frame",
|
||||||
|
});
|
||||||
|
|
||||||
|
lastAcceptedFrame = imageData;
|
||||||
|
lastAcceptedTime = time;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
const diffRatio = frameDifferenceRatio(
|
||||||
|
lastAcceptedFrame,
|
||||||
|
imageData,
|
||||||
|
settings.pixelDeltaThreshold
|
||||||
|
);
|
||||||
|
|
||||||
|
if (
|
||||||
|
shouldAcceptCandidate({
|
||||||
|
diffRatio,
|
||||||
|
currentTime: time,
|
||||||
|
lastAcceptedTime,
|
||||||
|
changedPixelRatioThreshold: settings.changedPixelRatioThreshold,
|
||||||
|
minSecondsBetweenCaptures: settings.minSecondsBetweenCaptures,
|
||||||
|
})
|
||||||
|
) {
|
||||||
|
candidates.push({
|
||||||
|
time,
|
||||||
|
score: diffRatio,
|
||||||
|
reason: "visual-change",
|
||||||
|
});
|
||||||
|
|
||||||
|
lastAcceptedFrame = imageData;
|
||||||
|
lastAcceptedTime = time;
|
||||||
|
}
|
||||||
|
|
||||||
|
onProgress(time / video.duration);
|
||||||
|
|
||||||
|
if (candidates.length >= settings.maxCandidates) {
|
||||||
|
// optional: stop early or continue and later rank candidates
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return rankAndTrimCandidates(candidates, settings.finalTargetCount);
|
||||||
|
}
|
||||||
|
|
||||||
|
Candidate type:
|
||||||
|
|
||||||
|
type CandidateFrame = {
|
||||||
|
time: number;
|
||||||
|
score: number;
|
||||||
|
reason: "initial-frame" | "visual-change" | "manual";
|
||||||
|
thumbnailUrl?: string;
|
||||||
|
fullBlob?: Blob;
|
||||||
|
};
|
||||||
|
8. Candidate ranking
|
||||||
|
|
||||||
|
Since the desired final count is only 4–10, do not simply take every threshold crossing.
|
||||||
|
|
||||||
|
Use ranking:
|
||||||
|
|
||||||
|
Always include first frame if useful.
|
||||||
|
Sort detected changes by score.
|
||||||
|
Enforce time diversity.
|
||||||
|
Keep the top N.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
function rankAndTrimCandidates(
|
||||||
|
candidates: CandidateFrame[],
|
||||||
|
targetCount: number
|
||||||
|
): CandidateFrame[] {
|
||||||
|
return [...candidates]
|
||||||
|
.sort((a, b) => b.score - a.score)
|
||||||
|
.slice(0, targetCount)
|
||||||
|
.sort((a, b) => a.time - b.time);
|
||||||
|
}
|
||||||
|
|
||||||
|
Better version:
|
||||||
|
|
||||||
|
Divide video into N temporal buckets
|
||||||
|
Pick the strongest candidate from each bucket
|
||||||
|
Then fill remaining slots by highest score
|
||||||
|
|
||||||
|
This avoids ending up with all images from one active segment.
|
||||||
|
|
||||||
|
9. Full-resolution extraction
|
||||||
|
|
||||||
|
Only render full-resolution images for selected candidates.
|
||||||
|
|
||||||
|
async function extractFullResolutionFrame(
|
||||||
|
video: HTMLVideoElement,
|
||||||
|
time: number,
|
||||||
|
type: "image/png" | "image/jpeg" | "image/webp" = "image/png",
|
||||||
|
quality?: number
|
||||||
|
): Promise<Blob> {
|
||||||
|
await seekVideo(video, time);
|
||||||
|
|
||||||
|
const canvas = document.createElement("canvas");
|
||||||
|
canvas.width = video.videoWidth;
|
||||||
|
canvas.height = video.videoHeight;
|
||||||
|
|
||||||
|
const ctx = canvas.getContext("2d");
|
||||||
|
if (!ctx) throw new Error("Could not get canvas context");
|
||||||
|
|
||||||
|
ctx.drawImage(video, 0, 0);
|
||||||
|
|
||||||
|
return new Promise((resolve, reject) => {
|
||||||
|
canvas.toBlob(
|
||||||
|
blob => {
|
||||||
|
if (!blob) reject(new Error("Canvas export failed"));
|
||||||
|
else resolve(blob);
|
||||||
|
},
|
||||||
|
type,
|
||||||
|
quality
|
||||||
|
);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
For clipboard, prefer PNG:
|
||||||
|
|
||||||
|
async function copyImageBlobToClipboard(blob: Blob) {
|
||||||
|
await navigator.clipboard.write([
|
||||||
|
new ClipboardItem({
|
||||||
|
[blob.type]: blob,
|
||||||
|
}),
|
||||||
|
]);
|
||||||
|
}
|
||||||
|
|
||||||
|
Use:
|
||||||
|
|
||||||
|
const blob = await extractFullResolutionFrame(video, candidate.time, "image/png");
|
||||||
|
await copyImageBlobToClipboard(blob);
|
||||||
|
|
||||||
|
Clipboard image writing should be initiated by a user action, such as clicking a Copy button, and the app should be served from HTTPS or localhost because the Clipboard API is gated behind secure contexts.
|
||||||
|
|
||||||
|
10. UI design
|
||||||
|
Main layout
|
||||||
|
┌──────────────────────────────────────────────┐
|
||||||
|
│ Video Key Frame Extractor │
|
||||||
|
├──────────────────────────────────────────────┤
|
||||||
|
│ [Select video] filename.mp4 │
|
||||||
|
│ Duration: 04:32 | Size: 612 MB | 1920×1080 │
|
||||||
|
├──────────────────────────────────────────────┤
|
||||||
|
│ Scan Settings │
|
||||||
|
│ Sample interval: [0.5s] │
|
||||||
|
│ Change threshold: [18%] │
|
||||||
|
│ Min gap: [2.0s] │
|
||||||
|
│ Target images: [8] │
|
||||||
|
│ [Scan Video] │
|
||||||
|
├──────────────────────────────────────────────┤
|
||||||
|
│ Progress bar │
|
||||||
|
├──────────────────────────────────────────────┤
|
||||||
|
│ Candidate Gallery │
|
||||||
|
│ [thumb] 00:04.5 score .24 [Copy] [Download] │
|
||||||
|
│ [thumb] 00:18.0 score .41 [Copy] [Download] │
|
||||||
|
│ [thumb] 01:22.5 score .37 [Copy] [Download] │
|
||||||
|
└──────────────────────────────────────────────┘
|
||||||
|
Controls
|
||||||
|
|
||||||
|
Minimum controls:
|
||||||
|
|
||||||
|
Select video
|
||||||
|
Scan
|
||||||
|
Stop scan
|
||||||
|
Reset
|
||||||
|
Threshold slider
|
||||||
|
Sample interval dropdown
|
||||||
|
Target count
|
||||||
|
Copy image
|
||||||
|
Download image
|
||||||
|
|
||||||
|
Advanced controls:
|
||||||
|
|
||||||
|
“Prefer fewer images”
|
||||||
|
“Prefer more images”
|
||||||
|
“Ignore tiny motion”
|
||||||
|
“Include first frame”
|
||||||
|
“Manually add current preview frame”
|
||||||
|
11. State model
|
||||||
|
|
||||||
|
Use Zustand or plain React reducer.
|
||||||
|
|
||||||
|
type AppState = {
|
||||||
|
video: LoadedVideo | null;
|
||||||
|
scanStatus: "idle" | "loading" | "scanning" | "done" | "error";
|
||||||
|
settings: ScanSettings;
|
||||||
|
candidates: CandidateFrame[];
|
||||||
|
selectedCandidateId: string | null;
|
||||||
|
progress: number;
|
||||||
|
error: string | null;
|
||||||
|
};
|
||||||
|
|
||||||
|
Actions:
|
||||||
|
|
||||||
|
type AppActions = {
|
||||||
|
loadVideo(file: File): Promise<void>;
|
||||||
|
scanVideo(): Promise<void>;
|
||||||
|
cancelScan(): void;
|
||||||
|
updateSettings(settings: Partial<ScanSettings>): void;
|
||||||
|
copyCandidate(candidateId: string): Promise<void>;
|
||||||
|
downloadCandidate(candidateId: string): Promise<void>;
|
||||||
|
removeCandidate(candidateId: string): void;
|
||||||
|
};
|
||||||
|
12. Worker design
|
||||||
|
|
||||||
|
For V1, the main thread can own the hidden video element because HTMLVideoElement is DOM-bound. But pixel comparison can be pushed into a worker.
|
||||||
|
|
||||||
|
Main thread:
|
||||||
|
|
||||||
|
seek video
|
||||||
|
draw frame to analysis canvas
|
||||||
|
get ImageData
|
||||||
|
send ImageData buffer to worker
|
||||||
|
worker returns diff score
|
||||||
|
|
||||||
|
Worker message:
|
||||||
|
|
||||||
|
type AnalyzeFrameMessage = {
|
||||||
|
type: "analyze-frame";
|
||||||
|
current: ImageData;
|
||||||
|
previous: ImageData;
|
||||||
|
pixelDeltaThreshold: number;
|
||||||
|
};
|
||||||
|
|
||||||
|
Worker response:
|
||||||
|
|
||||||
|
type AnalyzeFrameResult = {
|
||||||
|
type: "frame-score";
|
||||||
|
diffRatio: number;
|
||||||
|
};
|
||||||
|
|
||||||
|
For your target size, this may not be necessary at first. But it is worth structuring the code so the algorithm can move into a worker later.
|
||||||
|
|
||||||
|
13. Performance expectations
|
||||||
|
|
||||||
|
For a 10-minute video:
|
||||||
|
|
||||||
|
Sample interval: 0.5s
|
||||||
|
Comparisons: ~1200
|
||||||
|
Analysis resolution: 160×90
|
||||||
|
Pixels per comparison: 14,400
|
||||||
|
Total pixel comparisons: ~17.3 million
|
||||||
|
|
||||||
|
That is very reasonable in browser JavaScript.
|
||||||
|
|
||||||
|
For a 30-minute video:
|
||||||
|
|
||||||
|
Sample interval: 0.5s
|
||||||
|
Comparisons: ~3600
|
||||||
|
Analysis resolution: 160×90
|
||||||
|
Total pixel comparisons: ~51.8 million
|
||||||
|
|
||||||
|
Still reasonable. The bottleneck will usually be video seeking/decoding, not the pixel math.
|
||||||
|
|
||||||
|
14. File-size considerations
|
||||||
|
|
||||||
|
Your 100–1000 MB range is practical because:
|
||||||
|
|
||||||
|
The video is referenced via local file blob URL.
|
||||||
|
You only decode sampled frames.
|
||||||
|
You only store 4–10 output blobs.
|
||||||
|
Analysis frames are tiny.
|
||||||
|
|
||||||
|
Memory use should look roughly like:
|
||||||
|
|
||||||
|
Video decoder buffers: browser-managed
|
||||||
|
Analysis frame A: ~160 × 90 × 4 = 57.6 KB
|
||||||
|
Analysis frame B: ~57.6 KB
|
||||||
|
Full-res output image: only when accepted/copied
|
||||||
|
Candidate blobs: 4–10 images
|
||||||
|
|
||||||
|
At 1920×1080, one raw RGBA canvas frame is about:
|
||||||
|
|
||||||
|
1920 × 1080 × 4 = ~8.3 MB
|
||||||
|
|
||||||
|
At 3840×2160:
|
||||||
|
|
||||||
|
3840 × 2160 × 4 = ~33.2 MB
|
||||||
|
|
||||||
|
Still fine if you only create one full-res canvas at a time.
|
||||||
|
|
||||||
|
15. Browser compatibility
|
||||||
|
Best target
|
||||||
|
|
||||||
|
Start with Chromium-based browsers:
|
||||||
|
|
||||||
|
Chrome
|
||||||
|
Edge
|
||||||
|
Brave
|
||||||
|
Arc
|
||||||
|
Chromium
|
||||||
|
|
||||||
|
Reasons:
|
||||||
|
|
||||||
|
Strong media support.
|
||||||
|
Good Clipboard API support.
|
||||||
|
Good WebCodecs path for later.
|
||||||
|
|
||||||
|
Firefox should work for the V1 <video> + canvas path. WebCodecs VideoDecoder is listed by MDN as supported in Firefox 130+, while Chromium browsers have supported it since version 94.
|
||||||
|
|
||||||
|
Safari may work for simple video/canvas extraction, but clipboard image writing and codec behavior are areas to test carefully.
|
||||||
|
|
||||||
|
16. Security and privacy model
|
||||||
|
|
||||||
|
The app should explicitly state:
|
||||||
|
|
||||||
|
Your video is processed locally in your browser.
|
||||||
|
The file is not uploaded.
|
||||||
|
No server receives the video or extracted images.
|
||||||
|
|
||||||
|
Implementation requirements:
|
||||||
|
|
||||||
|
Do not send file contents to analytics.
|
||||||
|
Do not log object URLs remotely.
|
||||||
|
Avoid third-party scripts if privacy is part of the value prop.
|
||||||
|
Serve over HTTPS.
|
||||||
|
Make the app installable as a PWA if desired.
|
||||||
|
17. Error handling
|
||||||
|
|
||||||
|
Handle:
|
||||||
|
|
||||||
|
Case Response
|
||||||
|
Unsupported video codec Show “This browser cannot decode this video.”
|
||||||
|
Huge resolution Offer lower-quality extraction or warn about memory.
|
||||||
|
Clipboard blocked Offer download fallback.
|
||||||
|
Scan too slow Let user increase sample interval.
|
||||||
|
Too many candidates Raise threshold or bucket results.
|
||||||
|
No candidates found Lower threshold or include first/middle/end frames.
|
||||||
|
|
||||||
|
Clipboard fallback:
|
||||||
|
|
||||||
|
try {
|
||||||
|
await copyImageBlobToClipboard(blob);
|
||||||
|
} catch {
|
||||||
|
downloadBlob(blob, "frame.png");
|
||||||
|
}
|
||||||
|
18. Suggested file structure
|
||||||
|
src/
|
||||||
|
app/
|
||||||
|
App.tsx
|
||||||
|
store.ts
|
||||||
|
components/
|
||||||
|
FilePicker.tsx
|
||||||
|
VideoMetadataPanel.tsx
|
||||||
|
ScanSettingsPanel.tsx
|
||||||
|
ProgressPanel.tsx
|
||||||
|
CandidateGallery.tsx
|
||||||
|
CandidateCard.tsx
|
||||||
|
media/
|
||||||
|
loadVideo.ts
|
||||||
|
seekVideo.ts
|
||||||
|
captureFrame.ts
|
||||||
|
extractFullFrame.ts
|
||||||
|
clipboard.ts
|
||||||
|
analysis/
|
||||||
|
frameDifference.ts
|
||||||
|
histogramDifference.ts
|
||||||
|
candidateRanking.ts
|
||||||
|
scanVideo.ts
|
||||||
|
workers/
|
||||||
|
frameAnalysis.worker.ts
|
||||||
|
types/
|
||||||
|
media.ts
|
||||||
|
scan.ts
|
||||||
|
19. MVP implementation plan
|
||||||
|
Phase 1: Core prototype
|
||||||
|
|
||||||
|
Build:
|
||||||
|
|
||||||
|
File picker
|
||||||
|
Hidden video loader
|
||||||
|
Metadata display
|
||||||
|
Scan button
|
||||||
|
Basic frame-difference detection
|
||||||
|
Candidate list with timestamps
|
||||||
|
Thumbnail generation
|
||||||
|
|
||||||
|
No worker yet.
|
||||||
|
|
||||||
|
Phase 2: Copy/export
|
||||||
|
|
||||||
|
Add:
|
||||||
|
|
||||||
|
Full-res frame extraction
|
||||||
|
Copy PNG to clipboard
|
||||||
|
Download PNG fallback
|
||||||
|
Remove candidate
|
||||||
|
Manual add candidate
|
||||||
|
Phase 3: Better detection
|
||||||
|
|
||||||
|
Add:
|
||||||
|
|
||||||
|
Histogram comparison
|
||||||
|
Temporal bucketing
|
||||||
|
Threshold presets
|
||||||
|
“Strict / Balanced / Sensitive” modes
|
||||||
|
|
||||||
|
Example presets:
|
||||||
|
|
||||||
|
const presets = {
|
||||||
|
strict: {
|
||||||
|
sampleIntervalSeconds: 1.0,
|
||||||
|
changedPixelRatioThreshold: 0.28,
|
||||||
|
minSecondsBetweenCaptures: 4,
|
||||||
|
},
|
||||||
|
balanced: {
|
||||||
|
sampleIntervalSeconds: 0.5,
|
||||||
|
changedPixelRatioThreshold: 0.18,
|
||||||
|
minSecondsBetweenCaptures: 2,
|
||||||
|
},
|
||||||
|
sensitive: {
|
||||||
|
sampleIntervalSeconds: 0.25,
|
||||||
|
changedPixelRatioThreshold: 0.1,
|
||||||
|
minSecondsBetweenCaptures: 1,
|
||||||
|
},
|
||||||
|
};
|
||||||
|
Phase 4: Worker optimization
|
||||||
|
|
||||||
|
Move frame comparison into a worker.
|
||||||
|
|
||||||
|
Phase 5: Advanced media backend
|
||||||
|
|
||||||
|
Evaluate:
|
||||||
|
|
||||||
|
Mediabunny + WebCodecs
|
||||||
|
|
||||||
|
This is the path if you want better control over frame extraction, codecs, and container-level handling. Mediabunny is actively positioned as a modern browser media toolkit, and WebCodecs is the underlying native browser primitive for direct encoded/decoded media handling.
|
||||||
|
|
||||||
|
20. Final recommendation
|
||||||
|
|
||||||
|
For your app, build V1 with:
|
||||||
|
|
||||||
|
Vite + React + TypeScript
|
||||||
|
HTMLVideoElement
|
||||||
|
Canvas
|
||||||
|
Blob URLs
|
||||||
|
Clipboard API
|
||||||
|
Optional Zustand
|
||||||
|
|
||||||
|
Do not start with:
|
||||||
|
|
||||||
|
ffmpeg.wasm
|
||||||
|
OpenCV.js
|
||||||
|
Server-side processing
|
||||||
|
|
||||||
|
Best V1 strategy:
|
||||||
|
|
||||||
|
Sample every 0.5s
|
||||||
|
Analyze at 160×90
|
||||||
|
Compare against last accepted key frame
|
||||||
|
Use threshold + minimum time gap
|
||||||
|
Rank candidates down to 4–10
|
||||||
|
Extract full-res only when displaying/copying
|
||||||
Reference in New Issue
Block a user