929 lines
24 KiB
Markdown
929 lines
24 KiB
Markdown
Recommended approach
|
||
|
||
Use:
|
||
|
||
<input type="file">
|
||
→ URL.createObjectURL(file)
|
||
→ hidden <video>
|
||
→ sample frames periodically
|
||
→ draw sampled frame to canvas
|
||
→ compare against previous accepted frame
|
||
→ keep only significant-change frames
|
||
→ user clicks Copy on selected image
|
||
→ navigator.clipboard.write()
|
||
|
||
URL.createObjectURL(file) gives the browser a local blob URL for the selected file, and you should release it with URL.revokeObjectURL() when done.
|
||
|
||
Do not analyze every frame at full resolution
|
||
|
||
For change detection, use a small analysis canvas:
|
||
|
||
video frame → draw to 160×90 or 320×180 canvas → compare pixels
|
||
|
||
Then only render the full-resolution frame when you decide it is worth keeping.
|
||
|
||
Example strategy:
|
||
|
||
const ANALYSIS_W = 160;
|
||
const ANALYSIS_H = 90;
|
||
const SAMPLE_INTERVAL = 0.5; // seconds
|
||
const CHANGE_THRESHOLD = 0.18; // tune this
|
||
|
||
That means a 10-minute video sampled every 0.5 seconds is only:
|
||
|
||
10 min × 60 / 0.5 = 1200 comparisons
|
||
|
||
At 160×90, that is cheap.
|
||
|
||
Simple frame-difference algorithm
|
||
|
||
Use a downscaled canvas and compare luminance differences:
|
||
|
||
function frameDifference(a, b) {
|
||
let changed = 0;
|
||
const pixels = a.data.length / 4;
|
||
|
||
for (let i = 0; i < a.data.length; i += 4) {
|
||
const lumA = 0.299 * a.data[i] + 0.587 * a.data[i + 1] + 0.114 * a.data[i + 2];
|
||
const lumB = 0.299 * b.data[i] + 0.587 * b.data[i + 1] + 0.114 * b.data[i + 2];
|
||
|
||
if (Math.abs(lumA - lumB) > 32) changed++;
|
||
}
|
||
|
||
return changed / pixels;
|
||
}
|
||
|
||
Then:
|
||
|
||
if (diff > CHANGE_THRESHOLD) {
|
||
// save this timestamp as a candidate key frame
|
||
}
|
||
Better heuristic
|
||
|
||
Instead of comparing every sampled frame to the immediately previous sample, compare it to the last accepted key frame.
|
||
|
||
That avoids saving several frames during the same transition.
|
||
|
||
Frame A accepted
|
||
Frame B slightly different → reject
|
||
Frame C slightly different → reject
|
||
Frame D substantially different from A → accept
|
||
|
||
Also add a minimum time gap:
|
||
|
||
const MIN_SECONDS_BETWEEN_ACCEPTED = 2.0;
|
||
Clipboard copying
|
||
|
||
Copying images to the clipboard is possible with navigator.clipboard.write() and ClipboardItem. Browsers commonly support PNG image data, but clipboard access requires a secure context such as HTTPS or localhost, and browser/user activation rules apply.
|
||
|
||
Example:
|
||
|
||
async function copyCanvasToClipboard(canvas) {
|
||
const blob = await new Promise(resolve =>
|
||
canvas.toBlob(resolve, "image/png")
|
||
);
|
||
|
||
await navigator.clipboard.write([
|
||
new ClipboardItem({
|
||
"image/png": blob
|
||
})
|
||
]);
|
||
}
|
||
|
||
Important: this should happen from a user gesture, like clicking a Copy button.
|
||
|
||
Memory profile
|
||
|
||
This is light if implemented correctly:
|
||
|
||
1 local video blob URL
|
||
1 hidden video element
|
||
1 small analysis canvas
|
||
1 full-res preview canvas/image
|
||
4–10 saved output blobs
|
||
|
||
Avoid:
|
||
|
||
Storing hundreds/thousands of full-res canvases
|
||
Storing every sampled frame
|
||
Using FileReader.readAsDataURL(video)
|
||
Exporting a huge ZIP unless needed
|
||
|
||
For your target, keeping 4–10 PNG/WebP blobs in memory is fine.
|
||
|
||
Practical UX flow
|
||
|
||
I would build it like this:
|
||
|
||
User selects video.
|
||
App scans with adjustable settings:
|
||
sample interval
|
||
change threshold
|
||
minimum seconds between captures
|
||
max captures
|
||
App shows candidate thumbnails.
|
||
User deletes bad picks if needed.
|
||
Each thumbnail has:
|
||
Copy PNG
|
||
Download
|
||
timestamp
|
||
Optional: button to rescan with stricter/looser threshold.
|
||
Caveats
|
||
Plain <video> seeking may not be perfectly frame-accurate.
|
||
Sampling every 0.25–1.0s is much more practical than decoding every frame.
|
||
Browser codec support matters. MP4/H.264 is safest.
|
||
Safari/Firefox clipboard image support can be more limited than Chromium-based browsers.
|
||
For exact frame extraction, use WebCodecs or ffmpeg.wasm, but for your “significant scene change” use case, <video> + canvas is probably enough.
|
||
My recommended V1 stack
|
||
React/Vite
|
||
HTMLVideoElement
|
||
Canvas API
|
||
Blob/Object URLs
|
||
navigator.clipboard.write()
|
||
Optional: localForage or IndexedDB if you want session persistence
|
||
|
||
No backend required. No upload required.
|
||
|
||
What would be the best framework or library to handle these tasks? Come up with a detailed implementation / design document.
|
||
|
||
I’ll treat this as a local-only browser app design, optimized for your 100–1000 MB input range and 4–10 final images. I’ll ground the library/API choices against current browser support rather than assuming older constraints.
|
||
|
||
Thought for 5s
|
||
Recommendation
|
||
|
||
Use React + Vite + TypeScript for the app shell, but do the actual video work with native browser APIs first:
|
||
|
||
Vite + React + TypeScript
|
||
HTMLVideoElement
|
||
Canvas / OffscreenCanvas
|
||
Web Worker
|
||
Blob URLs
|
||
Clipboard API
|
||
|
||
For your use case, I would not start with ffmpeg.wasm. It is powerful, but it is overkill for scene/key-frame extraction and can become memory-heavy with 100–1000 MB files. ffmpeg.wasm is a browser/WebAssembly port of FFmpeg, but many real-world reports and project issues revolve around large-file memory limits and browser stability.
|
||
|
||
For a stronger V2, consider Mediabunny + WebCodecs. Mediabunny is a TypeScript media toolkit designed for browser-side reading/writing/conversion, and WebCodecs exposes native VideoFrame, EncodedVideoChunk, VideoDecoder, and VideoEncoder primitives.
|
||
|
||
Design Document: Local Video Key-Frame Extractor
|
||
1. Goal
|
||
|
||
Build a browser-only app that allows a user to:
|
||
|
||
Select a local video file.
|
||
Analyze it without uploading to a server.
|
||
Detect frames where the image content has significantly changed.
|
||
Present a small set of candidate key images.
|
||
Let the user copy selected images to the clipboard.
|
||
|
||
Target input:
|
||
|
||
Video size: 100–1000 MB
|
||
Expected output: 4–10 still images
|
||
Processing model: local-only
|
||
Backend: none required
|
||
2. Recommended architecture
|
||
User selects video
|
||
↓
|
||
Create local Blob URL
|
||
↓
|
||
Hidden <video> element loads local file
|
||
↓
|
||
Sampling engine seeks through video
|
||
↓
|
||
Downscaled analysis canvas compares frames
|
||
↓
|
||
Candidate detector stores timestamps
|
||
↓
|
||
Full-resolution renderer extracts selected frames
|
||
↓
|
||
Gallery shows thumbnails
|
||
↓
|
||
User copies PNG to clipboard
|
||
|
||
Core browser APIs:
|
||
|
||
Task API
|
||
Local file selection File API
|
||
Local video loading URL.createObjectURL()
|
||
Video decoding/seeking HTMLVideoElement
|
||
Pixel comparison Canvas API
|
||
Off-main-thread analysis Worker + OffscreenCanvas where available
|
||
Image export canvas.toBlob()
|
||
Clipboard copy navigator.clipboard.write() + ClipboardItem
|
||
|
||
URL.createObjectURL() is appropriate here because it creates a local blob URL for a File/Blob, avoiding the need to load the entire file into JavaScript memory. The Clipboard API supports writing non-text data through ClipboardItem, but it requires a secure context such as HTTPS or localhost.
|
||
|
||
3. Framework choice
|
||
Best V1 stack
|
||
Vite
|
||
React
|
||
TypeScript
|
||
Zustand or simple React state
|
||
Canvas API
|
||
Web Worker
|
||
|
||
Why:
|
||
|
||
Vite: fast local dev, easy static deployment.
|
||
React: good for gallery, controls, progress UI, and stateful review flow.
|
||
TypeScript: useful because media processing code quickly accumulates edge cases.
|
||
No backend: app can deploy as static files.
|
||
No heavy media dependency initially: browser decoding is enough for threshold-based change detection.
|
||
|
||
Suggested install:
|
||
|
||
npm create vite@latest video-keyframe-extractor -- --template react-ts
|
||
cd video-keyframe-extractor
|
||
npm install
|
||
npm run dev
|
||
|
||
Optional dependencies:
|
||
|
||
npm install zustand
|
||
|
||
I would avoid extra UI frameworks unless you want polished controls quickly.
|
||
|
||
4. Library/API decision matrix
|
||
Option Use for V1? Notes
|
||
Native <video> + canvas Yes Best starting point. Simple, no server, no large WASM payload.
|
||
Web Worker Yes Keeps UI responsive during scans.
|
||
OffscreenCanvas Yes, optional Good for worker-side pixel analysis where supported.
|
||
WebCodecs V2 Better low-level control, but more complex. WebCodecs is browser-native and provides direct access to decoded frames.
|
||
Mediabunny V2 / advanced Good candidate for more robust container/media handling. It advertises efficient browser-side reading and only loading what is needed.
|
||
ffmpeg.wasm Avoid for V1 Heavy and memory-risky for 100–1000 MB videos. Better for fallback/export/transcoding, not first-pass scene detection.
|
||
OpenCV.js Probably avoid Useful for advanced CV, but heavy for simple scene-difference detection.
|
||
5. App modules
|
||
5.1 File intake module
|
||
|
||
Responsibilities:
|
||
|
||
Accept local video file.
|
||
Validate file type and size.
|
||
Create blob URL.
|
||
Load video metadata.
|
||
Display duration, resolution, codec/browser compatibility status if possible.
|
||
|
||
Example state:
|
||
|
||
type LoadedVideo = {
|
||
file: File;
|
||
objectUrl: string;
|
||
duration: number;
|
||
width: number;
|
||
height: number;
|
||
};
|
||
|
||
Important cleanup:
|
||
|
||
URL.revokeObjectURL(objectUrl);
|
||
|
||
Never use this for the video:
|
||
|
||
FileReader.readAsDataURL(file);
|
||
|
||
That would load and base64-expand the whole video in memory.
|
||
|
||
5.2 Scan settings module
|
||
|
||
Expose these controls:
|
||
|
||
type ScanSettings = {
|
||
sampleIntervalSeconds: number; // default: 0.5
|
||
analysisWidth: number; // default: 160
|
||
analysisHeight: number; // default: 90
|
||
pixelDeltaThreshold: number; // default: 32
|
||
changedPixelRatioThreshold: number; // default: 0.18
|
||
minSecondsBetweenCaptures: number; // default: 2.0
|
||
maxCandidates: number; // default: 20
|
||
finalTargetCount: number; // default: 8
|
||
};
|
||
|
||
Suggested defaults:
|
||
|
||
const defaultSettings: ScanSettings = {
|
||
sampleIntervalSeconds: 0.5,
|
||
analysisWidth: 160,
|
||
analysisHeight: 90,
|
||
pixelDeltaThreshold: 32,
|
||
changedPixelRatioThreshold: 0.18,
|
||
minSecondsBetweenCaptures: 2.0,
|
||
maxCandidates: 20,
|
||
finalTargetCount: 8,
|
||
};
|
||
5.3 Frame sampling module
|
||
|
||
V1 should seek using a hidden video element:
|
||
|
||
async function seekVideo(video: HTMLVideoElement, time: number): Promise<void> {
|
||
return new Promise((resolve, reject) => {
|
||
const onSeeked = () => {
|
||
cleanup();
|
||
resolve();
|
||
};
|
||
|
||
const onError = () => {
|
||
cleanup();
|
||
reject(video.error);
|
||
};
|
||
|
||
const cleanup = () => {
|
||
video.removeEventListener("seeked", onSeeked);
|
||
video.removeEventListener("error", onError);
|
||
};
|
||
|
||
video.addEventListener("seeked", onSeeked, { once: true });
|
||
video.addEventListener("error", onError, { once: true });
|
||
|
||
video.currentTime = Math.min(time, video.duration);
|
||
});
|
||
}
|
||
|
||
Then draw into a small analysis canvas:
|
||
|
||
function captureAnalysisFrame(
|
||
video: HTMLVideoElement,
|
||
canvas: HTMLCanvasElement,
|
||
width: number,
|
||
height: number
|
||
): ImageData {
|
||
canvas.width = width;
|
||
canvas.height = height;
|
||
|
||
const ctx = canvas.getContext("2d", { willReadFrequently: true });
|
||
if (!ctx) throw new Error("Could not get canvas context");
|
||
|
||
ctx.drawImage(video, 0, 0, width, height);
|
||
return ctx.getImageData(0, 0, width, height);
|
||
}
|
||
6. Change detection algorithm
|
||
6.1 Simple and effective V1
|
||
|
||
Compare current sampled frame against the last accepted key frame, not merely the immediately previous sample.
|
||
|
||
This prevents over-detecting long transitions.
|
||
|
||
function frameDifferenceRatio(
|
||
a: ImageData,
|
||
b: ImageData,
|
||
pixelDeltaThreshold: number
|
||
): number {
|
||
const dataA = a.data;
|
||
const dataB = b.data;
|
||
let changed = 0;
|
||
const pixels = dataA.length / 4;
|
||
|
||
for (let i = 0; i < dataA.length; i += 4) {
|
||
const lumA =
|
||
0.299 * dataA[i] +
|
||
0.587 * dataA[i + 1] +
|
||
0.114 * dataA[i + 2];
|
||
|
||
const lumB =
|
||
0.299 * dataB[i] +
|
||
0.587 * dataB[i + 1] +
|
||
0.114 * dataB[i + 2];
|
||
|
||
if (Math.abs(lumA - lumB) > pixelDeltaThreshold) {
|
||
changed++;
|
||
}
|
||
}
|
||
|
||
return changed / pixels;
|
||
}
|
||
|
||
Acceptance logic:
|
||
|
||
function shouldAcceptCandidate(params: {
|
||
diffRatio: number;
|
||
currentTime: number;
|
||
lastAcceptedTime: number | null;
|
||
changedPixelRatioThreshold: number;
|
||
minSecondsBetweenCaptures: number;
|
||
}) {
|
||
const {
|
||
diffRatio,
|
||
currentTime,
|
||
lastAcceptedTime,
|
||
changedPixelRatioThreshold,
|
||
minSecondsBetweenCaptures,
|
||
} = params;
|
||
|
||
if (diffRatio < changedPixelRatioThreshold) return false;
|
||
|
||
if (
|
||
lastAcceptedTime !== null &&
|
||
currentTime - lastAcceptedTime < minSecondsBetweenCaptures
|
||
) {
|
||
return false;
|
||
}
|
||
|
||
return true;
|
||
}
|
||
6.2 Better V1.5: histogram difference
|
||
|
||
Raw pixel comparison can be sensitive to motion, noise, flashes, compression artifacts, and camera pans.
|
||
|
||
A better approach is to compare low-resolution color/luma histograms.
|
||
|
||
Recommended hybrid score:
|
||
|
||
score = 0.7 × luminance histogram difference
|
||
+ 0.3 × pixel difference ratio
|
||
|
||
This tends to better detect scene/content changes instead of tiny motion.
|
||
|
||
6.3 Optional V2: perceptual hash
|
||
|
||
For more stable detection:
|
||
|
||
Generate a small grayscale frame, e.g. 32×32.
|
||
Compute average hash or difference hash.
|
||
Compare Hamming distance.
|
||
Accept when distance exceeds threshold.
|
||
|
||
This is good when videos have subtle compression differences but meaningful visual changes.
|
||
|
||
7. Scan workflow
|
||
|
||
Pseudo-code:
|
||
|
||
async function scanVideo(
|
||
video: HTMLVideoElement,
|
||
settings: ScanSettings,
|
||
onProgress: (progress: number) => void
|
||
): Promise<CandidateFrame[]> {
|
||
const analysisCanvas = document.createElement("canvas");
|
||
|
||
const candidates: CandidateFrame[] = [];
|
||
let lastAcceptedFrame: ImageData | null = null;
|
||
let lastAcceptedTime: number | null = null;
|
||
|
||
for (
|
||
let time = 0;
|
||
time < video.duration;
|
||
time += settings.sampleIntervalSeconds
|
||
) {
|
||
await seekVideo(video, time);
|
||
|
||
const imageData = captureAnalysisFrame(
|
||
video,
|
||
analysisCanvas,
|
||
settings.analysisWidth,
|
||
settings.analysisHeight
|
||
);
|
||
|
||
if (!lastAcceptedFrame) {
|
||
candidates.push({
|
||
time,
|
||
score: 1,
|
||
reason: "initial-frame",
|
||
});
|
||
|
||
lastAcceptedFrame = imageData;
|
||
lastAcceptedTime = time;
|
||
continue;
|
||
}
|
||
|
||
const diffRatio = frameDifferenceRatio(
|
||
lastAcceptedFrame,
|
||
imageData,
|
||
settings.pixelDeltaThreshold
|
||
);
|
||
|
||
if (
|
||
shouldAcceptCandidate({
|
||
diffRatio,
|
||
currentTime: time,
|
||
lastAcceptedTime,
|
||
changedPixelRatioThreshold: settings.changedPixelRatioThreshold,
|
||
minSecondsBetweenCaptures: settings.minSecondsBetweenCaptures,
|
||
})
|
||
) {
|
||
candidates.push({
|
||
time,
|
||
score: diffRatio,
|
||
reason: "visual-change",
|
||
});
|
||
|
||
lastAcceptedFrame = imageData;
|
||
lastAcceptedTime = time;
|
||
}
|
||
|
||
onProgress(time / video.duration);
|
||
|
||
if (candidates.length >= settings.maxCandidates) {
|
||
// optional: stop early or continue and later rank candidates
|
||
}
|
||
}
|
||
|
||
return rankAndTrimCandidates(candidates, settings.finalTargetCount);
|
||
}
|
||
|
||
Candidate type:
|
||
|
||
type CandidateFrame = {
|
||
time: number;
|
||
score: number;
|
||
reason: "initial-frame" | "visual-change" | "manual";
|
||
thumbnailUrl?: string;
|
||
fullBlob?: Blob;
|
||
};
|
||
8. Candidate ranking
|
||
|
||
Since the desired final count is only 4–10, do not simply take every threshold crossing.
|
||
|
||
Use ranking:
|
||
|
||
Always include first frame if useful.
|
||
Sort detected changes by score.
|
||
Enforce time diversity.
|
||
Keep the top N.
|
||
|
||
Example:
|
||
|
||
function rankAndTrimCandidates(
|
||
candidates: CandidateFrame[],
|
||
targetCount: number
|
||
): CandidateFrame[] {
|
||
return [...candidates]
|
||
.sort((a, b) => b.score - a.score)
|
||
.slice(0, targetCount)
|
||
.sort((a, b) => a.time - b.time);
|
||
}
|
||
|
||
Better version:
|
||
|
||
Divide video into N temporal buckets
|
||
Pick the strongest candidate from each bucket
|
||
Then fill remaining slots by highest score
|
||
|
||
This avoids ending up with all images from one active segment.
|
||
|
||
9. Full-resolution extraction
|
||
|
||
Only render full-resolution images for selected candidates.
|
||
|
||
async function extractFullResolutionFrame(
|
||
video: HTMLVideoElement,
|
||
time: number,
|
||
type: "image/png" | "image/jpeg" | "image/webp" = "image/png",
|
||
quality?: number
|
||
): Promise<Blob> {
|
||
await seekVideo(video, time);
|
||
|
||
const canvas = document.createElement("canvas");
|
||
canvas.width = video.videoWidth;
|
||
canvas.height = video.videoHeight;
|
||
|
||
const ctx = canvas.getContext("2d");
|
||
if (!ctx) throw new Error("Could not get canvas context");
|
||
|
||
ctx.drawImage(video, 0, 0);
|
||
|
||
return new Promise((resolve, reject) => {
|
||
canvas.toBlob(
|
||
blob => {
|
||
if (!blob) reject(new Error("Canvas export failed"));
|
||
else resolve(blob);
|
||
},
|
||
type,
|
||
quality
|
||
);
|
||
});
|
||
}
|
||
|
||
For clipboard, prefer PNG:
|
||
|
||
async function copyImageBlobToClipboard(blob: Blob) {
|
||
await navigator.clipboard.write([
|
||
new ClipboardItem({
|
||
[blob.type]: blob,
|
||
}),
|
||
]);
|
||
}
|
||
|
||
Use:
|
||
|
||
const blob = await extractFullResolutionFrame(video, candidate.time, "image/png");
|
||
await copyImageBlobToClipboard(blob);
|
||
|
||
Clipboard image writing should be initiated by a user action, such as clicking a Copy button, and the app should be served from HTTPS or localhost because the Clipboard API is gated behind secure contexts.
|
||
|
||
10. UI design
|
||
Main layout
|
||
┌──────────────────────────────────────────────┐
|
||
│ Video Key Frame Extractor │
|
||
├──────────────────────────────────────────────┤
|
||
│ [Select video] filename.mp4 │
|
||
│ Duration: 04:32 | Size: 612 MB | 1920×1080 │
|
||
├──────────────────────────────────────────────┤
|
||
│ Scan Settings │
|
||
│ Sample interval: [0.5s] │
|
||
│ Change threshold: [18%] │
|
||
│ Min gap: [2.0s] │
|
||
│ Target images: [8] │
|
||
│ [Scan Video] │
|
||
├──────────────────────────────────────────────┤
|
||
│ Progress bar │
|
||
├──────────────────────────────────────────────┤
|
||
│ Candidate Gallery │
|
||
│ [thumb] 00:04.5 score .24 [Copy] [Download] │
|
||
│ [thumb] 00:18.0 score .41 [Copy] [Download] │
|
||
│ [thumb] 01:22.5 score .37 [Copy] [Download] │
|
||
└──────────────────────────────────────────────┘
|
||
Controls
|
||
|
||
Minimum controls:
|
||
|
||
Select video
|
||
Scan
|
||
Stop scan
|
||
Reset
|
||
Threshold slider
|
||
Sample interval dropdown
|
||
Target count
|
||
Copy image
|
||
Download image
|
||
|
||
Advanced controls:
|
||
|
||
“Prefer fewer images”
|
||
“Prefer more images”
|
||
“Ignore tiny motion”
|
||
“Include first frame”
|
||
“Manually add current preview frame”
|
||
11. State model
|
||
|
||
Use Zustand or plain React reducer.
|
||
|
||
type AppState = {
|
||
video: LoadedVideo | null;
|
||
scanStatus: "idle" | "loading" | "scanning" | "done" | "error";
|
||
settings: ScanSettings;
|
||
candidates: CandidateFrame[];
|
||
selectedCandidateId: string | null;
|
||
progress: number;
|
||
error: string | null;
|
||
};
|
||
|
||
Actions:
|
||
|
||
type AppActions = {
|
||
loadVideo(file: File): Promise<void>;
|
||
scanVideo(): Promise<void>;
|
||
cancelScan(): void;
|
||
updateSettings(settings: Partial<ScanSettings>): void;
|
||
copyCandidate(candidateId: string): Promise<void>;
|
||
downloadCandidate(candidateId: string): Promise<void>;
|
||
removeCandidate(candidateId: string): void;
|
||
};
|
||
12. Worker design
|
||
|
||
For V1, the main thread can own the hidden video element because HTMLVideoElement is DOM-bound. But pixel comparison can be pushed into a worker.
|
||
|
||
Main thread:
|
||
|
||
seek video
|
||
draw frame to analysis canvas
|
||
get ImageData
|
||
send ImageData buffer to worker
|
||
worker returns diff score
|
||
|
||
Worker message:
|
||
|
||
type AnalyzeFrameMessage = {
|
||
type: "analyze-frame";
|
||
current: ImageData;
|
||
previous: ImageData;
|
||
pixelDeltaThreshold: number;
|
||
};
|
||
|
||
Worker response:
|
||
|
||
type AnalyzeFrameResult = {
|
||
type: "frame-score";
|
||
diffRatio: number;
|
||
};
|
||
|
||
For your target size, this may not be necessary at first. But it is worth structuring the code so the algorithm can move into a worker later.
|
||
|
||
13. Performance expectations
|
||
|
||
For a 10-minute video:
|
||
|
||
Sample interval: 0.5s
|
||
Comparisons: ~1200
|
||
Analysis resolution: 160×90
|
||
Pixels per comparison: 14,400
|
||
Total pixel comparisons: ~17.3 million
|
||
|
||
That is very reasonable in browser JavaScript.
|
||
|
||
For a 30-minute video:
|
||
|
||
Sample interval: 0.5s
|
||
Comparisons: ~3600
|
||
Analysis resolution: 160×90
|
||
Total pixel comparisons: ~51.8 million
|
||
|
||
Still reasonable. The bottleneck will usually be video seeking/decoding, not the pixel math.
|
||
|
||
14. File-size considerations
|
||
|
||
Your 100–1000 MB range is practical because:
|
||
|
||
The video is referenced via local file blob URL.
|
||
You only decode sampled frames.
|
||
You only store 4–10 output blobs.
|
||
Analysis frames are tiny.
|
||
|
||
Memory use should look roughly like:
|
||
|
||
Video decoder buffers: browser-managed
|
||
Analysis frame A: ~160 × 90 × 4 = 57.6 KB
|
||
Analysis frame B: ~57.6 KB
|
||
Full-res output image: only when accepted/copied
|
||
Candidate blobs: 4–10 images
|
||
|
||
At 1920×1080, one raw RGBA canvas frame is about:
|
||
|
||
1920 × 1080 × 4 = ~8.3 MB
|
||
|
||
At 3840×2160:
|
||
|
||
3840 × 2160 × 4 = ~33.2 MB
|
||
|
||
Still fine if you only create one full-res canvas at a time.
|
||
|
||
15. Browser compatibility
|
||
Best target
|
||
|
||
Start with Chromium-based browsers:
|
||
|
||
Chrome
|
||
Edge
|
||
Brave
|
||
Arc
|
||
Chromium
|
||
|
||
Reasons:
|
||
|
||
Strong media support.
|
||
Good Clipboard API support.
|
||
Good WebCodecs path for later.
|
||
|
||
Firefox should work for the V1 <video> + canvas path. WebCodecs VideoDecoder is listed by MDN as supported in Firefox 130+, while Chromium browsers have supported it since version 94.
|
||
|
||
Safari may work for simple video/canvas extraction, but clipboard image writing and codec behavior are areas to test carefully.
|
||
|
||
16. Security and privacy model
|
||
|
||
The app should explicitly state:
|
||
|
||
Your video is processed locally in your browser.
|
||
The file is not uploaded.
|
||
No server receives the video or extracted images.
|
||
|
||
Implementation requirements:
|
||
|
||
Do not send file contents to analytics.
|
||
Do not log object URLs remotely.
|
||
Avoid third-party scripts if privacy is part of the value prop.
|
||
Serve over HTTPS.
|
||
Make the app installable as a PWA if desired.
|
||
17. Error handling
|
||
|
||
Handle:
|
||
|
||
Case Response
|
||
Unsupported video codec Show “This browser cannot decode this video.”
|
||
Huge resolution Offer lower-quality extraction or warn about memory.
|
||
Clipboard blocked Offer download fallback.
|
||
Scan too slow Let user increase sample interval.
|
||
Too many candidates Raise threshold or bucket results.
|
||
No candidates found Lower threshold or include first/middle/end frames.
|
||
|
||
Clipboard fallback:
|
||
|
||
try {
|
||
await copyImageBlobToClipboard(blob);
|
||
} catch {
|
||
downloadBlob(blob, "frame.png");
|
||
}
|
||
18. Suggested file structure
|
||
src/
|
||
app/
|
||
App.tsx
|
||
store.ts
|
||
components/
|
||
FilePicker.tsx
|
||
VideoMetadataPanel.tsx
|
||
ScanSettingsPanel.tsx
|
||
ProgressPanel.tsx
|
||
CandidateGallery.tsx
|
||
CandidateCard.tsx
|
||
media/
|
||
loadVideo.ts
|
||
seekVideo.ts
|
||
captureFrame.ts
|
||
extractFullFrame.ts
|
||
clipboard.ts
|
||
analysis/
|
||
frameDifference.ts
|
||
histogramDifference.ts
|
||
candidateRanking.ts
|
||
scanVideo.ts
|
||
workers/
|
||
frameAnalysis.worker.ts
|
||
types/
|
||
media.ts
|
||
scan.ts
|
||
19. MVP implementation plan
|
||
Phase 1: Core prototype
|
||
|
||
Build:
|
||
|
||
File picker
|
||
Hidden video loader
|
||
Metadata display
|
||
Scan button
|
||
Basic frame-difference detection
|
||
Candidate list with timestamps
|
||
Thumbnail generation
|
||
|
||
No worker yet.
|
||
|
||
Phase 2: Copy/export
|
||
|
||
Add:
|
||
|
||
Full-res frame extraction
|
||
Copy PNG to clipboard
|
||
Download PNG fallback
|
||
Remove candidate
|
||
Manual add candidate
|
||
Phase 3: Better detection
|
||
|
||
Add:
|
||
|
||
Histogram comparison
|
||
Temporal bucketing
|
||
Threshold presets
|
||
“Strict / Balanced / Sensitive” modes
|
||
|
||
Example presets:
|
||
|
||
const presets = {
|
||
strict: {
|
||
sampleIntervalSeconds: 1.0,
|
||
changedPixelRatioThreshold: 0.28,
|
||
minSecondsBetweenCaptures: 4,
|
||
},
|
||
balanced: {
|
||
sampleIntervalSeconds: 0.5,
|
||
changedPixelRatioThreshold: 0.18,
|
||
minSecondsBetweenCaptures: 2,
|
||
},
|
||
sensitive: {
|
||
sampleIntervalSeconds: 0.25,
|
||
changedPixelRatioThreshold: 0.1,
|
||
minSecondsBetweenCaptures: 1,
|
||
},
|
||
};
|
||
Phase 4: Worker optimization
|
||
|
||
Move frame comparison into a worker.
|
||
|
||
Phase 5: Advanced media backend
|
||
|
||
Evaluate:
|
||
|
||
Mediabunny + WebCodecs
|
||
|
||
This is the path if you want better control over frame extraction, codecs, and container-level handling. Mediabunny is actively positioned as a modern browser media toolkit, and WebCodecs is the underlying native browser primitive for direct encoded/decoded media handling.
|
||
|
||
20. Final recommendation
|
||
|
||
For your app, build V1 with:
|
||
|
||
Vite + React + TypeScript
|
||
HTMLVideoElement
|
||
Canvas
|
||
Blob URLs
|
||
Clipboard API
|
||
Optional Zustand
|
||
|
||
Do not start with:
|
||
|
||
ffmpeg.wasm
|
||
OpenCV.js
|
||
Server-side processing
|
||
|
||
Best V1 strategy:
|
||
|
||
Sample every 0.5s
|
||
Analyze at 160×90
|
||
Compare against last accepted key frame
|
||
Use threshold + minimum time gap
|
||
Rank candidates down to 4–10
|
||
Extract full-res only when displaying/copying |