docs: add design document for local video key-frame extractor
This commit is contained in:
@@ -0,0 +1,929 @@
|
||||
Recommended approach
|
||||
|
||||
Use:
|
||||
|
||||
<input type="file">
|
||||
→ URL.createObjectURL(file)
|
||||
→ hidden <video>
|
||||
→ sample frames periodically
|
||||
→ draw sampled frame to canvas
|
||||
→ compare against previous accepted frame
|
||||
→ keep only significant-change frames
|
||||
→ user clicks Copy on selected image
|
||||
→ navigator.clipboard.write()
|
||||
|
||||
URL.createObjectURL(file) gives the browser a local blob URL for the selected file, and you should release it with URL.revokeObjectURL() when done.
|
||||
|
||||
Do not analyze every frame at full resolution
|
||||
|
||||
For change detection, use a small analysis canvas:
|
||||
|
||||
video frame → draw to 160×90 or 320×180 canvas → compare pixels
|
||||
|
||||
Then only render the full-resolution frame when you decide it is worth keeping.
|
||||
|
||||
Example strategy:
|
||||
|
||||
const ANALYSIS_W = 160;
|
||||
const ANALYSIS_H = 90;
|
||||
const SAMPLE_INTERVAL = 0.5; // seconds
|
||||
const CHANGE_THRESHOLD = 0.18; // tune this
|
||||
|
||||
That means a 10-minute video sampled every 0.5 seconds is only:
|
||||
|
||||
10 min × 60 / 0.5 = 1200 comparisons
|
||||
|
||||
At 160×90, that is cheap.
|
||||
|
||||
Simple frame-difference algorithm
|
||||
|
||||
Use a downscaled canvas and compare luminance differences:
|
||||
|
||||
function frameDifference(a, b) {
|
||||
let changed = 0;
|
||||
const pixels = a.data.length / 4;
|
||||
|
||||
for (let i = 0; i < a.data.length; i += 4) {
|
||||
const lumA = 0.299 * a.data[i] + 0.587 * a.data[i + 1] + 0.114 * a.data[i + 2];
|
||||
const lumB = 0.299 * b.data[i] + 0.587 * b.data[i + 1] + 0.114 * b.data[i + 2];
|
||||
|
||||
if (Math.abs(lumA - lumB) > 32) changed++;
|
||||
}
|
||||
|
||||
return changed / pixels;
|
||||
}
|
||||
|
||||
Then:
|
||||
|
||||
if (diff > CHANGE_THRESHOLD) {
|
||||
// save this timestamp as a candidate key frame
|
||||
}
|
||||
Better heuristic
|
||||
|
||||
Instead of comparing every sampled frame to the immediately previous sample, compare it to the last accepted key frame.
|
||||
|
||||
That avoids saving several frames during the same transition.
|
||||
|
||||
Frame A accepted
|
||||
Frame B slightly different → reject
|
||||
Frame C slightly different → reject
|
||||
Frame D substantially different from A → accept
|
||||
|
||||
Also add a minimum time gap:
|
||||
|
||||
const MIN_SECONDS_BETWEEN_ACCEPTED = 2.0;
|
||||
Clipboard copying
|
||||
|
||||
Copying images to the clipboard is possible with navigator.clipboard.write() and ClipboardItem. Browsers commonly support PNG image data, but clipboard access requires a secure context such as HTTPS or localhost, and browser/user activation rules apply.
|
||||
|
||||
Example:
|
||||
|
||||
async function copyCanvasToClipboard(canvas) {
|
||||
const blob = await new Promise(resolve =>
|
||||
canvas.toBlob(resolve, "image/png")
|
||||
);
|
||||
|
||||
await navigator.clipboard.write([
|
||||
new ClipboardItem({
|
||||
"image/png": blob
|
||||
})
|
||||
]);
|
||||
}
|
||||
|
||||
Important: this should happen from a user gesture, like clicking a Copy button.
|
||||
|
||||
Memory profile
|
||||
|
||||
This is light if implemented correctly:
|
||||
|
||||
1 local video blob URL
|
||||
1 hidden video element
|
||||
1 small analysis canvas
|
||||
1 full-res preview canvas/image
|
||||
4–10 saved output blobs
|
||||
|
||||
Avoid:
|
||||
|
||||
Storing hundreds/thousands of full-res canvases
|
||||
Storing every sampled frame
|
||||
Using FileReader.readAsDataURL(video)
|
||||
Exporting a huge ZIP unless needed
|
||||
|
||||
For your target, keeping 4–10 PNG/WebP blobs in memory is fine.
|
||||
|
||||
Practical UX flow
|
||||
|
||||
I would build it like this:
|
||||
|
||||
User selects video.
|
||||
App scans with adjustable settings:
|
||||
sample interval
|
||||
change threshold
|
||||
minimum seconds between captures
|
||||
max captures
|
||||
App shows candidate thumbnails.
|
||||
User deletes bad picks if needed.
|
||||
Each thumbnail has:
|
||||
Copy PNG
|
||||
Download
|
||||
timestamp
|
||||
Optional: button to rescan with stricter/looser threshold.
|
||||
Caveats
|
||||
Plain <video> seeking may not be perfectly frame-accurate.
|
||||
Sampling every 0.25–1.0s is much more practical than decoding every frame.
|
||||
Browser codec support matters. MP4/H.264 is safest.
|
||||
Safari/Firefox clipboard image support can be more limited than Chromium-based browsers.
|
||||
For exact frame extraction, use WebCodecs or ffmpeg.wasm, but for your “significant scene change” use case, <video> + canvas is probably enough.
|
||||
My recommended V1 stack
|
||||
React/Vite
|
||||
HTMLVideoElement
|
||||
Canvas API
|
||||
Blob/Object URLs
|
||||
navigator.clipboard.write()
|
||||
Optional: localForage or IndexedDB if you want session persistence
|
||||
|
||||
No backend required. No upload required.
|
||||
|
||||
What would be the best framework or library to handle these tasks? Come up with a detailed implementation / design document.
|
||||
|
||||
I’ll treat this as a local-only browser app design, optimized for your 100–1000 MB input range and 4–10 final images. I’ll ground the library/API choices against current browser support rather than assuming older constraints.
|
||||
|
||||
Thought for 5s
|
||||
Recommendation
|
||||
|
||||
Use React + Vite + TypeScript for the app shell, but do the actual video work with native browser APIs first:
|
||||
|
||||
Vite + React + TypeScript
|
||||
HTMLVideoElement
|
||||
Canvas / OffscreenCanvas
|
||||
Web Worker
|
||||
Blob URLs
|
||||
Clipboard API
|
||||
|
||||
For your use case, I would not start with ffmpeg.wasm. It is powerful, but it is overkill for scene/key-frame extraction and can become memory-heavy with 100–1000 MB files. ffmpeg.wasm is a browser/WebAssembly port of FFmpeg, but many real-world reports and project issues revolve around large-file memory limits and browser stability.
|
||||
|
||||
For a stronger V2, consider Mediabunny + WebCodecs. Mediabunny is a TypeScript media toolkit designed for browser-side reading/writing/conversion, and WebCodecs exposes native VideoFrame, EncodedVideoChunk, VideoDecoder, and VideoEncoder primitives.
|
||||
|
||||
Design Document: Local Video Key-Frame Extractor
|
||||
1. Goal
|
||||
|
||||
Build a browser-only app that allows a user to:
|
||||
|
||||
Select a local video file.
|
||||
Analyze it without uploading to a server.
|
||||
Detect frames where the image content has significantly changed.
|
||||
Present a small set of candidate key images.
|
||||
Let the user copy selected images to the clipboard.
|
||||
|
||||
Target input:
|
||||
|
||||
Video size: 100–1000 MB
|
||||
Expected output: 4–10 still images
|
||||
Processing model: local-only
|
||||
Backend: none required
|
||||
2. Recommended architecture
|
||||
User selects video
|
||||
↓
|
||||
Create local Blob URL
|
||||
↓
|
||||
Hidden <video> element loads local file
|
||||
↓
|
||||
Sampling engine seeks through video
|
||||
↓
|
||||
Downscaled analysis canvas compares frames
|
||||
↓
|
||||
Candidate detector stores timestamps
|
||||
↓
|
||||
Full-resolution renderer extracts selected frames
|
||||
↓
|
||||
Gallery shows thumbnails
|
||||
↓
|
||||
User copies PNG to clipboard
|
||||
|
||||
Core browser APIs:
|
||||
|
||||
Task API
|
||||
Local file selection File API
|
||||
Local video loading URL.createObjectURL()
|
||||
Video decoding/seeking HTMLVideoElement
|
||||
Pixel comparison Canvas API
|
||||
Off-main-thread analysis Worker + OffscreenCanvas where available
|
||||
Image export canvas.toBlob()
|
||||
Clipboard copy navigator.clipboard.write() + ClipboardItem
|
||||
|
||||
URL.createObjectURL() is appropriate here because it creates a local blob URL for a File/Blob, avoiding the need to load the entire file into JavaScript memory. The Clipboard API supports writing non-text data through ClipboardItem, but it requires a secure context such as HTTPS or localhost.
|
||||
|
||||
3. Framework choice
|
||||
Best V1 stack
|
||||
Vite
|
||||
React
|
||||
TypeScript
|
||||
Zustand or simple React state
|
||||
Canvas API
|
||||
Web Worker
|
||||
|
||||
Why:
|
||||
|
||||
Vite: fast local dev, easy static deployment.
|
||||
React: good for gallery, controls, progress UI, and stateful review flow.
|
||||
TypeScript: useful because media processing code quickly accumulates edge cases.
|
||||
No backend: app can deploy as static files.
|
||||
No heavy media dependency initially: browser decoding is enough for threshold-based change detection.
|
||||
|
||||
Suggested install:
|
||||
|
||||
npm create vite@latest video-keyframe-extractor -- --template react-ts
|
||||
cd video-keyframe-extractor
|
||||
npm install
|
||||
npm run dev
|
||||
|
||||
Optional dependencies:
|
||||
|
||||
npm install zustand
|
||||
|
||||
I would avoid extra UI frameworks unless you want polished controls quickly.
|
||||
|
||||
4. Library/API decision matrix
|
||||
Option Use for V1? Notes
|
||||
Native <video> + canvas Yes Best starting point. Simple, no server, no large WASM payload.
|
||||
Web Worker Yes Keeps UI responsive during scans.
|
||||
OffscreenCanvas Yes, optional Good for worker-side pixel analysis where supported.
|
||||
WebCodecs V2 Better low-level control, but more complex. WebCodecs is browser-native and provides direct access to decoded frames.
|
||||
Mediabunny V2 / advanced Good candidate for more robust container/media handling. It advertises efficient browser-side reading and only loading what is needed.
|
||||
ffmpeg.wasm Avoid for V1 Heavy and memory-risky for 100–1000 MB videos. Better for fallback/export/transcoding, not first-pass scene detection.
|
||||
OpenCV.js Probably avoid Useful for advanced CV, but heavy for simple scene-difference detection.
|
||||
5. App modules
|
||||
5.1 File intake module
|
||||
|
||||
Responsibilities:
|
||||
|
||||
Accept local video file.
|
||||
Validate file type and size.
|
||||
Create blob URL.
|
||||
Load video metadata.
|
||||
Display duration, resolution, codec/browser compatibility status if possible.
|
||||
|
||||
Example state:
|
||||
|
||||
type LoadedVideo = {
|
||||
file: File;
|
||||
objectUrl: string;
|
||||
duration: number;
|
||||
width: number;
|
||||
height: number;
|
||||
};
|
||||
|
||||
Important cleanup:
|
||||
|
||||
URL.revokeObjectURL(objectUrl);
|
||||
|
||||
Never use this for the video:
|
||||
|
||||
FileReader.readAsDataURL(file);
|
||||
|
||||
That would load and base64-expand the whole video in memory.
|
||||
|
||||
5.2 Scan settings module
|
||||
|
||||
Expose these controls:
|
||||
|
||||
type ScanSettings = {
|
||||
sampleIntervalSeconds: number; // default: 0.5
|
||||
analysisWidth: number; // default: 160
|
||||
analysisHeight: number; // default: 90
|
||||
pixelDeltaThreshold: number; // default: 32
|
||||
changedPixelRatioThreshold: number; // default: 0.18
|
||||
minSecondsBetweenCaptures: number; // default: 2.0
|
||||
maxCandidates: number; // default: 20
|
||||
finalTargetCount: number; // default: 8
|
||||
};
|
||||
|
||||
Suggested defaults:
|
||||
|
||||
const defaultSettings: ScanSettings = {
|
||||
sampleIntervalSeconds: 0.5,
|
||||
analysisWidth: 160,
|
||||
analysisHeight: 90,
|
||||
pixelDeltaThreshold: 32,
|
||||
changedPixelRatioThreshold: 0.18,
|
||||
minSecondsBetweenCaptures: 2.0,
|
||||
maxCandidates: 20,
|
||||
finalTargetCount: 8,
|
||||
};
|
||||
5.3 Frame sampling module
|
||||
|
||||
V1 should seek using a hidden video element:
|
||||
|
||||
async function seekVideo(video: HTMLVideoElement, time: number): Promise<void> {
|
||||
return new Promise((resolve, reject) => {
|
||||
const onSeeked = () => {
|
||||
cleanup();
|
||||
resolve();
|
||||
};
|
||||
|
||||
const onError = () => {
|
||||
cleanup();
|
||||
reject(video.error);
|
||||
};
|
||||
|
||||
const cleanup = () => {
|
||||
video.removeEventListener("seeked", onSeeked);
|
||||
video.removeEventListener("error", onError);
|
||||
};
|
||||
|
||||
video.addEventListener("seeked", onSeeked, { once: true });
|
||||
video.addEventListener("error", onError, { once: true });
|
||||
|
||||
video.currentTime = Math.min(time, video.duration);
|
||||
});
|
||||
}
|
||||
|
||||
Then draw into a small analysis canvas:
|
||||
|
||||
function captureAnalysisFrame(
|
||||
video: HTMLVideoElement,
|
||||
canvas: HTMLCanvasElement,
|
||||
width: number,
|
||||
height: number
|
||||
): ImageData {
|
||||
canvas.width = width;
|
||||
canvas.height = height;
|
||||
|
||||
const ctx = canvas.getContext("2d", { willReadFrequently: true });
|
||||
if (!ctx) throw new Error("Could not get canvas context");
|
||||
|
||||
ctx.drawImage(video, 0, 0, width, height);
|
||||
return ctx.getImageData(0, 0, width, height);
|
||||
}
|
||||
6. Change detection algorithm
|
||||
6.1 Simple and effective V1
|
||||
|
||||
Compare current sampled frame against the last accepted key frame, not merely the immediately previous sample.
|
||||
|
||||
This prevents over-detecting long transitions.
|
||||
|
||||
function frameDifferenceRatio(
|
||||
a: ImageData,
|
||||
b: ImageData,
|
||||
pixelDeltaThreshold: number
|
||||
): number {
|
||||
const dataA = a.data;
|
||||
const dataB = b.data;
|
||||
let changed = 0;
|
||||
const pixels = dataA.length / 4;
|
||||
|
||||
for (let i = 0; i < dataA.length; i += 4) {
|
||||
const lumA =
|
||||
0.299 * dataA[i] +
|
||||
0.587 * dataA[i + 1] +
|
||||
0.114 * dataA[i + 2];
|
||||
|
||||
const lumB =
|
||||
0.299 * dataB[i] +
|
||||
0.587 * dataB[i + 1] +
|
||||
0.114 * dataB[i + 2];
|
||||
|
||||
if (Math.abs(lumA - lumB) > pixelDeltaThreshold) {
|
||||
changed++;
|
||||
}
|
||||
}
|
||||
|
||||
return changed / pixels;
|
||||
}
|
||||
|
||||
Acceptance logic:
|
||||
|
||||
function shouldAcceptCandidate(params: {
|
||||
diffRatio: number;
|
||||
currentTime: number;
|
||||
lastAcceptedTime: number | null;
|
||||
changedPixelRatioThreshold: number;
|
||||
minSecondsBetweenCaptures: number;
|
||||
}) {
|
||||
const {
|
||||
diffRatio,
|
||||
currentTime,
|
||||
lastAcceptedTime,
|
||||
changedPixelRatioThreshold,
|
||||
minSecondsBetweenCaptures,
|
||||
} = params;
|
||||
|
||||
if (diffRatio < changedPixelRatioThreshold) return false;
|
||||
|
||||
if (
|
||||
lastAcceptedTime !== null &&
|
||||
currentTime - lastAcceptedTime < minSecondsBetweenCaptures
|
||||
) {
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
6.2 Better V1.5: histogram difference
|
||||
|
||||
Raw pixel comparison can be sensitive to motion, noise, flashes, compression artifacts, and camera pans.
|
||||
|
||||
A better approach is to compare low-resolution color/luma histograms.
|
||||
|
||||
Recommended hybrid score:
|
||||
|
||||
score = 0.7 × luminance histogram difference
|
||||
+ 0.3 × pixel difference ratio
|
||||
|
||||
This tends to better detect scene/content changes instead of tiny motion.
|
||||
|
||||
6.3 Optional V2: perceptual hash
|
||||
|
||||
For more stable detection:
|
||||
|
||||
Generate a small grayscale frame, e.g. 32×32.
|
||||
Compute average hash or difference hash.
|
||||
Compare Hamming distance.
|
||||
Accept when distance exceeds threshold.
|
||||
|
||||
This is good when videos have subtle compression differences but meaningful visual changes.
|
||||
|
||||
7. Scan workflow
|
||||
|
||||
Pseudo-code:
|
||||
|
||||
async function scanVideo(
|
||||
video: HTMLVideoElement,
|
||||
settings: ScanSettings,
|
||||
onProgress: (progress: number) => void
|
||||
): Promise<CandidateFrame[]> {
|
||||
const analysisCanvas = document.createElement("canvas");
|
||||
|
||||
const candidates: CandidateFrame[] = [];
|
||||
let lastAcceptedFrame: ImageData | null = null;
|
||||
let lastAcceptedTime: number | null = null;
|
||||
|
||||
for (
|
||||
let time = 0;
|
||||
time < video.duration;
|
||||
time += settings.sampleIntervalSeconds
|
||||
) {
|
||||
await seekVideo(video, time);
|
||||
|
||||
const imageData = captureAnalysisFrame(
|
||||
video,
|
||||
analysisCanvas,
|
||||
settings.analysisWidth,
|
||||
settings.analysisHeight
|
||||
);
|
||||
|
||||
if (!lastAcceptedFrame) {
|
||||
candidates.push({
|
||||
time,
|
||||
score: 1,
|
||||
reason: "initial-frame",
|
||||
});
|
||||
|
||||
lastAcceptedFrame = imageData;
|
||||
lastAcceptedTime = time;
|
||||
continue;
|
||||
}
|
||||
|
||||
const diffRatio = frameDifferenceRatio(
|
||||
lastAcceptedFrame,
|
||||
imageData,
|
||||
settings.pixelDeltaThreshold
|
||||
);
|
||||
|
||||
if (
|
||||
shouldAcceptCandidate({
|
||||
diffRatio,
|
||||
currentTime: time,
|
||||
lastAcceptedTime,
|
||||
changedPixelRatioThreshold: settings.changedPixelRatioThreshold,
|
||||
minSecondsBetweenCaptures: settings.minSecondsBetweenCaptures,
|
||||
})
|
||||
) {
|
||||
candidates.push({
|
||||
time,
|
||||
score: diffRatio,
|
||||
reason: "visual-change",
|
||||
});
|
||||
|
||||
lastAcceptedFrame = imageData;
|
||||
lastAcceptedTime = time;
|
||||
}
|
||||
|
||||
onProgress(time / video.duration);
|
||||
|
||||
if (candidates.length >= settings.maxCandidates) {
|
||||
// optional: stop early or continue and later rank candidates
|
||||
}
|
||||
}
|
||||
|
||||
return rankAndTrimCandidates(candidates, settings.finalTargetCount);
|
||||
}
|
||||
|
||||
Candidate type:
|
||||
|
||||
type CandidateFrame = {
|
||||
time: number;
|
||||
score: number;
|
||||
reason: "initial-frame" | "visual-change" | "manual";
|
||||
thumbnailUrl?: string;
|
||||
fullBlob?: Blob;
|
||||
};
|
||||
8. Candidate ranking
|
||||
|
||||
Since the desired final count is only 4–10, do not simply take every threshold crossing.
|
||||
|
||||
Use ranking:
|
||||
|
||||
Always include first frame if useful.
|
||||
Sort detected changes by score.
|
||||
Enforce time diversity.
|
||||
Keep the top N.
|
||||
|
||||
Example:
|
||||
|
||||
function rankAndTrimCandidates(
|
||||
candidates: CandidateFrame[],
|
||||
targetCount: number
|
||||
): CandidateFrame[] {
|
||||
return [...candidates]
|
||||
.sort((a, b) => b.score - a.score)
|
||||
.slice(0, targetCount)
|
||||
.sort((a, b) => a.time - b.time);
|
||||
}
|
||||
|
||||
Better version:
|
||||
|
||||
Divide video into N temporal buckets
|
||||
Pick the strongest candidate from each bucket
|
||||
Then fill remaining slots by highest score
|
||||
|
||||
This avoids ending up with all images from one active segment.
|
||||
|
||||
9. Full-resolution extraction
|
||||
|
||||
Only render full-resolution images for selected candidates.
|
||||
|
||||
async function extractFullResolutionFrame(
|
||||
video: HTMLVideoElement,
|
||||
time: number,
|
||||
type: "image/png" | "image/jpeg" | "image/webp" = "image/png",
|
||||
quality?: number
|
||||
): Promise<Blob> {
|
||||
await seekVideo(video, time);
|
||||
|
||||
const canvas = document.createElement("canvas");
|
||||
canvas.width = video.videoWidth;
|
||||
canvas.height = video.videoHeight;
|
||||
|
||||
const ctx = canvas.getContext("2d");
|
||||
if (!ctx) throw new Error("Could not get canvas context");
|
||||
|
||||
ctx.drawImage(video, 0, 0);
|
||||
|
||||
return new Promise((resolve, reject) => {
|
||||
canvas.toBlob(
|
||||
blob => {
|
||||
if (!blob) reject(new Error("Canvas export failed"));
|
||||
else resolve(blob);
|
||||
},
|
||||
type,
|
||||
quality
|
||||
);
|
||||
});
|
||||
}
|
||||
|
||||
For clipboard, prefer PNG:
|
||||
|
||||
async function copyImageBlobToClipboard(blob: Blob) {
|
||||
await navigator.clipboard.write([
|
||||
new ClipboardItem({
|
||||
[blob.type]: blob,
|
||||
}),
|
||||
]);
|
||||
}
|
||||
|
||||
Use:
|
||||
|
||||
const blob = await extractFullResolutionFrame(video, candidate.time, "image/png");
|
||||
await copyImageBlobToClipboard(blob);
|
||||
|
||||
Clipboard image writing should be initiated by a user action, such as clicking a Copy button, and the app should be served from HTTPS or localhost because the Clipboard API is gated behind secure contexts.
|
||||
|
||||
10. UI design
|
||||
Main layout
|
||||
┌──────────────────────────────────────────────┐
|
||||
│ Video Key Frame Extractor │
|
||||
├──────────────────────────────────────────────┤
|
||||
│ [Select video] filename.mp4 │
|
||||
│ Duration: 04:32 | Size: 612 MB | 1920×1080 │
|
||||
├──────────────────────────────────────────────┤
|
||||
│ Scan Settings │
|
||||
│ Sample interval: [0.5s] │
|
||||
│ Change threshold: [18%] │
|
||||
│ Min gap: [2.0s] │
|
||||
│ Target images: [8] │
|
||||
│ [Scan Video] │
|
||||
├──────────────────────────────────────────────┤
|
||||
│ Progress bar │
|
||||
├──────────────────────────────────────────────┤
|
||||
│ Candidate Gallery │
|
||||
│ [thumb] 00:04.5 score .24 [Copy] [Download] │
|
||||
│ [thumb] 00:18.0 score .41 [Copy] [Download] │
|
||||
│ [thumb] 01:22.5 score .37 [Copy] [Download] │
|
||||
└──────────────────────────────────────────────┘
|
||||
Controls
|
||||
|
||||
Minimum controls:
|
||||
|
||||
Select video
|
||||
Scan
|
||||
Stop scan
|
||||
Reset
|
||||
Threshold slider
|
||||
Sample interval dropdown
|
||||
Target count
|
||||
Copy image
|
||||
Download image
|
||||
|
||||
Advanced controls:
|
||||
|
||||
“Prefer fewer images”
|
||||
“Prefer more images”
|
||||
“Ignore tiny motion”
|
||||
“Include first frame”
|
||||
“Manually add current preview frame”
|
||||
11. State model
|
||||
|
||||
Use Zustand or plain React reducer.
|
||||
|
||||
type AppState = {
|
||||
video: LoadedVideo | null;
|
||||
scanStatus: "idle" | "loading" | "scanning" | "done" | "error";
|
||||
settings: ScanSettings;
|
||||
candidates: CandidateFrame[];
|
||||
selectedCandidateId: string | null;
|
||||
progress: number;
|
||||
error: string | null;
|
||||
};
|
||||
|
||||
Actions:
|
||||
|
||||
type AppActions = {
|
||||
loadVideo(file: File): Promise<void>;
|
||||
scanVideo(): Promise<void>;
|
||||
cancelScan(): void;
|
||||
updateSettings(settings: Partial<ScanSettings>): void;
|
||||
copyCandidate(candidateId: string): Promise<void>;
|
||||
downloadCandidate(candidateId: string): Promise<void>;
|
||||
removeCandidate(candidateId: string): void;
|
||||
};
|
||||
12. Worker design
|
||||
|
||||
For V1, the main thread can own the hidden video element because HTMLVideoElement is DOM-bound. But pixel comparison can be pushed into a worker.
|
||||
|
||||
Main thread:
|
||||
|
||||
seek video
|
||||
draw frame to analysis canvas
|
||||
get ImageData
|
||||
send ImageData buffer to worker
|
||||
worker returns diff score
|
||||
|
||||
Worker message:
|
||||
|
||||
type AnalyzeFrameMessage = {
|
||||
type: "analyze-frame";
|
||||
current: ImageData;
|
||||
previous: ImageData;
|
||||
pixelDeltaThreshold: number;
|
||||
};
|
||||
|
||||
Worker response:
|
||||
|
||||
type AnalyzeFrameResult = {
|
||||
type: "frame-score";
|
||||
diffRatio: number;
|
||||
};
|
||||
|
||||
For your target size, this may not be necessary at first. But it is worth structuring the code so the algorithm can move into a worker later.
|
||||
|
||||
13. Performance expectations
|
||||
|
||||
For a 10-minute video:
|
||||
|
||||
Sample interval: 0.5s
|
||||
Comparisons: ~1200
|
||||
Analysis resolution: 160×90
|
||||
Pixels per comparison: 14,400
|
||||
Total pixel comparisons: ~17.3 million
|
||||
|
||||
That is very reasonable in browser JavaScript.
|
||||
|
||||
For a 30-minute video:
|
||||
|
||||
Sample interval: 0.5s
|
||||
Comparisons: ~3600
|
||||
Analysis resolution: 160×90
|
||||
Total pixel comparisons: ~51.8 million
|
||||
|
||||
Still reasonable. The bottleneck will usually be video seeking/decoding, not the pixel math.
|
||||
|
||||
14. File-size considerations
|
||||
|
||||
Your 100–1000 MB range is practical because:
|
||||
|
||||
The video is referenced via local file blob URL.
|
||||
You only decode sampled frames.
|
||||
You only store 4–10 output blobs.
|
||||
Analysis frames are tiny.
|
||||
|
||||
Memory use should look roughly like:
|
||||
|
||||
Video decoder buffers: browser-managed
|
||||
Analysis frame A: ~160 × 90 × 4 = 57.6 KB
|
||||
Analysis frame B: ~57.6 KB
|
||||
Full-res output image: only when accepted/copied
|
||||
Candidate blobs: 4–10 images
|
||||
|
||||
At 1920×1080, one raw RGBA canvas frame is about:
|
||||
|
||||
1920 × 1080 × 4 = ~8.3 MB
|
||||
|
||||
At 3840×2160:
|
||||
|
||||
3840 × 2160 × 4 = ~33.2 MB
|
||||
|
||||
Still fine if you only create one full-res canvas at a time.
|
||||
|
||||
15. Browser compatibility
|
||||
Best target
|
||||
|
||||
Start with Chromium-based browsers:
|
||||
|
||||
Chrome
|
||||
Edge
|
||||
Brave
|
||||
Arc
|
||||
Chromium
|
||||
|
||||
Reasons:
|
||||
|
||||
Strong media support.
|
||||
Good Clipboard API support.
|
||||
Good WebCodecs path for later.
|
||||
|
||||
Firefox should work for the V1 <video> + canvas path. WebCodecs VideoDecoder is listed by MDN as supported in Firefox 130+, while Chromium browsers have supported it since version 94.
|
||||
|
||||
Safari may work for simple video/canvas extraction, but clipboard image writing and codec behavior are areas to test carefully.
|
||||
|
||||
16. Security and privacy model
|
||||
|
||||
The app should explicitly state:
|
||||
|
||||
Your video is processed locally in your browser.
|
||||
The file is not uploaded.
|
||||
No server receives the video or extracted images.
|
||||
|
||||
Implementation requirements:
|
||||
|
||||
Do not send file contents to analytics.
|
||||
Do not log object URLs remotely.
|
||||
Avoid third-party scripts if privacy is part of the value prop.
|
||||
Serve over HTTPS.
|
||||
Make the app installable as a PWA if desired.
|
||||
17. Error handling
|
||||
|
||||
Handle:
|
||||
|
||||
Case Response
|
||||
Unsupported video codec Show “This browser cannot decode this video.”
|
||||
Huge resolution Offer lower-quality extraction or warn about memory.
|
||||
Clipboard blocked Offer download fallback.
|
||||
Scan too slow Let user increase sample interval.
|
||||
Too many candidates Raise threshold or bucket results.
|
||||
No candidates found Lower threshold or include first/middle/end frames.
|
||||
|
||||
Clipboard fallback:
|
||||
|
||||
try {
|
||||
await copyImageBlobToClipboard(blob);
|
||||
} catch {
|
||||
downloadBlob(blob, "frame.png");
|
||||
}
|
||||
18. Suggested file structure
|
||||
src/
|
||||
app/
|
||||
App.tsx
|
||||
store.ts
|
||||
components/
|
||||
FilePicker.tsx
|
||||
VideoMetadataPanel.tsx
|
||||
ScanSettingsPanel.tsx
|
||||
ProgressPanel.tsx
|
||||
CandidateGallery.tsx
|
||||
CandidateCard.tsx
|
||||
media/
|
||||
loadVideo.ts
|
||||
seekVideo.ts
|
||||
captureFrame.ts
|
||||
extractFullFrame.ts
|
||||
clipboard.ts
|
||||
analysis/
|
||||
frameDifference.ts
|
||||
histogramDifference.ts
|
||||
candidateRanking.ts
|
||||
scanVideo.ts
|
||||
workers/
|
||||
frameAnalysis.worker.ts
|
||||
types/
|
||||
media.ts
|
||||
scan.ts
|
||||
19. MVP implementation plan
|
||||
Phase 1: Core prototype
|
||||
|
||||
Build:
|
||||
|
||||
File picker
|
||||
Hidden video loader
|
||||
Metadata display
|
||||
Scan button
|
||||
Basic frame-difference detection
|
||||
Candidate list with timestamps
|
||||
Thumbnail generation
|
||||
|
||||
No worker yet.
|
||||
|
||||
Phase 2: Copy/export
|
||||
|
||||
Add:
|
||||
|
||||
Full-res frame extraction
|
||||
Copy PNG to clipboard
|
||||
Download PNG fallback
|
||||
Remove candidate
|
||||
Manual add candidate
|
||||
Phase 3: Better detection
|
||||
|
||||
Add:
|
||||
|
||||
Histogram comparison
|
||||
Temporal bucketing
|
||||
Threshold presets
|
||||
“Strict / Balanced / Sensitive” modes
|
||||
|
||||
Example presets:
|
||||
|
||||
const presets = {
|
||||
strict: {
|
||||
sampleIntervalSeconds: 1.0,
|
||||
changedPixelRatioThreshold: 0.28,
|
||||
minSecondsBetweenCaptures: 4,
|
||||
},
|
||||
balanced: {
|
||||
sampleIntervalSeconds: 0.5,
|
||||
changedPixelRatioThreshold: 0.18,
|
||||
minSecondsBetweenCaptures: 2,
|
||||
},
|
||||
sensitive: {
|
||||
sampleIntervalSeconds: 0.25,
|
||||
changedPixelRatioThreshold: 0.1,
|
||||
minSecondsBetweenCaptures: 1,
|
||||
},
|
||||
};
|
||||
Phase 4: Worker optimization
|
||||
|
||||
Move frame comparison into a worker.
|
||||
|
||||
Phase 5: Advanced media backend
|
||||
|
||||
Evaluate:
|
||||
|
||||
Mediabunny + WebCodecs
|
||||
|
||||
This is the path if you want better control over frame extraction, codecs, and container-level handling. Mediabunny is actively positioned as a modern browser media toolkit, and WebCodecs is the underlying native browser primitive for direct encoded/decoded media handling.
|
||||
|
||||
20. Final recommendation
|
||||
|
||||
For your app, build V1 with:
|
||||
|
||||
Vite + React + TypeScript
|
||||
HTMLVideoElement
|
||||
Canvas
|
||||
Blob URLs
|
||||
Clipboard API
|
||||
Optional Zustand
|
||||
|
||||
Do not start with:
|
||||
|
||||
ffmpeg.wasm
|
||||
OpenCV.js
|
||||
Server-side processing
|
||||
|
||||
Best V1 strategy:
|
||||
|
||||
Sample every 0.5s
|
||||
Analyze at 160×90
|
||||
Compare against last accepted key frame
|
||||
Use threshold + minimum time gap
|
||||
Rank candidates down to 4–10
|
||||
Extract full-res only when displaying/copying
|
||||
Reference in New Issue
Block a user