Frame-Analyzer/design-document.md

Recommended approach

Use:

<input type="file">
→ URL.createObjectURL(file)
→ hidden <video>
→ sample frames periodically
→ draw sampled frame to canvas
→ compare against previous accepted frame
→ keep only significant-change frames
→ user clicks Copy on selected image
→ navigator.clipboard.write()

URL.createObjectURL(file) gives the browser a local blob URL for the selected file, and you should release it with URL.revokeObjectURL() when done.

Do not analyze every frame at full resolution

For change detection, use a small analysis canvas:

video frame → draw to 160×90 or 320×180 canvas → compare pixels

Then only render the full-resolution frame when you decide it is worth keeping.

Example strategy:

const ANALYSIS_W = 160;
const ANALYSIS_H = 90;
const SAMPLE_INTERVAL = 0.5; // seconds
const CHANGE_THRESHOLD = 0.18; // tune this

That means a 10-minute video sampled every 0.5 seconds is only:

10 min × 60 / 0.5 = 1200 comparisons

At 160×90, that is cheap.

Simple frame-difference algorithm

Use a downscaled canvas and compare luminance differences:

function frameDifference(a, b) {
  let changed = 0;
  const pixels = a.data.length / 4;

  for (let i = 0; i < a.data.length; i += 4) {
    const lumA = 0.299 * a.data[i] + 0.587 * a.data[i + 1] + 0.114 * a.data[i + 2];
    const lumB = 0.299 * b.data[i] + 0.587 * b.data[i + 1] + 0.114 * b.data[i + 2];

    if (Math.abs(lumA - lumB) > 32) changed++;
  }

  return changed / pixels;
}

Then:

if (diff > CHANGE_THRESHOLD) {
  // save this timestamp as a candidate key frame
}
Better heuristic

Instead of comparing every sampled frame to the immediately previous sample, compare it to the last accepted key frame.

That avoids saving several frames during the same transition.

Frame A accepted
Frame B slightly different → reject
Frame C slightly different → reject
Frame D substantially different from A → accept

Also add a minimum time gap:

const MIN_SECONDS_BETWEEN_ACCEPTED = 2.0;
Clipboard copying

Copying images to the clipboard is possible with navigator.clipboard.write() and ClipboardItem. Browsers commonly support PNG image data, but clipboard access requires a secure context such as HTTPS or localhost, and browser/user activation rules apply.

Example:

async function copyCanvasToClipboard(canvas) {
  const blob = await new Promise(resolve =>
    canvas.toBlob(resolve, "image/png")
  );

  await navigator.clipboard.write([
    new ClipboardItem({
      "image/png": blob
    })
  ]);
}

Important: this should happen from a user gesture, like clicking a Copy button.

Memory profile

This is light if implemented correctly:

1 local video blob URL
1 hidden video element
1 small analysis canvas
1 full-res preview canvas/image
4–10 saved output blobs

Avoid:

Storing hundreds/thousands of full-res canvases
Storing every sampled frame
Using FileReader.readAsDataURL(video)
Exporting a huge ZIP unless needed

For your target, keeping 4–10 PNG/WebP blobs in memory is fine.

Practical UX flow

I would build it like this:

User selects video.
App scans with adjustable settings:
sample interval
change threshold
minimum seconds between captures
max captures
App shows candidate thumbnails.
User deletes bad picks if needed.
Each thumbnail has:
Copy PNG
Download
timestamp
Optional: button to rescan with stricter/looser threshold.
Caveats
Plain <video> seeking may not be perfectly frame-accurate.
Sampling every 0.25–1.0s is much more practical than decoding every frame.
Browser codec support matters. MP4/H.264 is safest.
Safari/Firefox clipboard image support can be more limited than Chromium-based browsers.
For exact frame extraction, use WebCodecs or ffmpeg.wasm, but for your “significant scene change” use case, <video> + canvas is probably enough.
My recommended V1 stack
React/Vite
HTMLVideoElement
Canvas API
Blob/Object URLs
navigator.clipboard.write()
Optional: localForage or IndexedDB if you want session persistence

No backend required. No upload required.

What would be the best framework or library to handle these tasks? Come up with a detailed implementation / design document.

I’ll treat this as a local-only browser app design, optimized for your 100–1000 MB input range and 4–10 final images. I’ll ground the library/API choices against current browser support rather than assuming older constraints.

Thought for 5s
Recommendation

Use React + Vite + TypeScript for the app shell, but do the actual video work with native browser APIs first:

Vite + React + TypeScript
HTMLVideoElement
Canvas / OffscreenCanvas
Web Worker
Blob URLs
Clipboard API

For your use case, I would not start with ffmpeg.wasm. It is powerful, but it is overkill for scene/key-frame extraction and can become memory-heavy with 100–1000 MB files. ffmpeg.wasm is a browser/WebAssembly port of FFmpeg, but many real-world reports and project issues revolve around large-file memory limits and browser stability.

For a stronger V2, consider Mediabunny + WebCodecs. Mediabunny is a TypeScript media toolkit designed for browser-side reading/writing/conversion, and WebCodecs exposes native VideoFrame, EncodedVideoChunk, VideoDecoder, and VideoEncoder primitives.

Design Document: Local Video Key-Frame Extractor
1. Goal

Build a browser-only app that allows a user to:

Select a local video file.
Analyze it without uploading to a server.
Detect frames where the image content has significantly changed.
Present a small set of candidate key images.
Let the user copy selected images to the clipboard.

Target input:

Video size: 100–1000 MB
Expected output: 4–10 still images
Processing model: local-only
Backend: none required
2. Recommended architecture
User selects video
    ↓
Create local Blob URL
    ↓
Hidden <video> element loads local file
    ↓
Sampling engine seeks through video
    ↓
Downscaled analysis canvas compares frames
    ↓
Candidate detector stores timestamps
    ↓
Full-resolution renderer extracts selected frames
    ↓
Gallery shows thumbnails
    ↓
User copies PNG to clipboard

Core browser APIs:

Task	API
Local file selection	File API
Local video loading	URL.createObjectURL()
Video decoding/seeking	HTMLVideoElement
Pixel comparison	Canvas API
Off-main-thread analysis	Worker + OffscreenCanvas where available
Image export	canvas.toBlob()
Clipboard copy	navigator.clipboard.write() + ClipboardItem

URL.createObjectURL() is appropriate here because it creates a local blob URL for a File/Blob, avoiding the need to load the entire file into JavaScript memory. The Clipboard API supports writing non-text data through ClipboardItem, but it requires a secure context such as HTTPS or localhost.

3. Framework choice
Best V1 stack
Vite
React
TypeScript
Zustand or simple React state
Canvas API
Web Worker

Why:

Vite: fast local dev, easy static deployment.
React: good for gallery, controls, progress UI, and stateful review flow.
TypeScript: useful because media processing code quickly accumulates edge cases.
No backend: app can deploy as static files.
No heavy media dependency initially: browser decoding is enough for threshold-based change detection.

Suggested install:

npm create vite@latest video-keyframe-extractor -- --template react-ts
cd video-keyframe-extractor
npm install
npm run dev

Optional dependencies:

npm install zustand

I would avoid extra UI frameworks unless you want polished controls quickly.

4. Library/API decision matrix
Option	Use for V1?	Notes
Native <video> + canvas	Yes	Best starting point. Simple, no server, no large WASM payload.
Web Worker	Yes	Keeps UI responsive during scans.
OffscreenCanvas	Yes, optional	Good for worker-side pixel analysis where supported.
WebCodecs	V2	Better low-level control, but more complex. WebCodecs is browser-native and provides direct access to decoded frames.
Mediabunny	V2 / advanced	Good candidate for more robust container/media handling. It advertises efficient browser-side reading and only loading what is needed.
ffmpeg.wasm	Avoid for V1	Heavy and memory-risky for 100–1000 MB videos. Better for fallback/export/transcoding, not first-pass scene detection.
OpenCV.js	Probably avoid	Useful for advanced CV, but heavy for simple scene-difference detection.
5. App modules
5.1 File intake module

Responsibilities:

Accept local video file.
Validate file type and size.
Create blob URL.
Load video metadata.
Display duration, resolution, codec/browser compatibility status if possible.

Example state:

type LoadedVideo = {
  file: File;
  objectUrl: string;
  duration: number;
  width: number;
  height: number;
};

Important cleanup:

URL.revokeObjectURL(objectUrl);

Never use this for the video:

FileReader.readAsDataURL(file);

That would load and base64-expand the whole video in memory.

5.2 Scan settings module

Expose these controls:

type ScanSettings = {
  sampleIntervalSeconds: number;       // default: 0.5
  analysisWidth: number;               // default: 160
  analysisHeight: number;              // default: 90
  pixelDeltaThreshold: number;         // default: 32
  changedPixelRatioThreshold: number;  // default: 0.18
  minSecondsBetweenCaptures: number;   // default: 2.0
  maxCandidates: number;               // default: 20
  finalTargetCount: number;            // default: 8
};

Suggested defaults:

const defaultSettings: ScanSettings = {
  sampleIntervalSeconds: 0.5,
  analysisWidth: 160,
  analysisHeight: 90,
  pixelDeltaThreshold: 32,
  changedPixelRatioThreshold: 0.18,
  minSecondsBetweenCaptures: 2.0,
  maxCandidates: 20,
  finalTargetCount: 8,
};
5.3 Frame sampling module

V1 should seek using a hidden video element:

async function seekVideo(video: HTMLVideoElement, time: number): Promise<void> {
  return new Promise((resolve, reject) => {
    const onSeeked = () => {
      cleanup();
      resolve();
    };

    const onError = () => {
      cleanup();
      reject(video.error);
    };

    const cleanup = () => {
      video.removeEventListener("seeked", onSeeked);
      video.removeEventListener("error", onError);
    };

    video.addEventListener("seeked", onSeeked, { once: true });
    video.addEventListener("error", onError, { once: true });

    video.currentTime = Math.min(time, video.duration);
  });
}

Then draw into a small analysis canvas:

function captureAnalysisFrame(
  video: HTMLVideoElement,
  canvas: HTMLCanvasElement,
  width: number,
  height: number
): ImageData {
  canvas.width = width;
  canvas.height = height;

  const ctx = canvas.getContext("2d", { willReadFrequently: true });
  if (!ctx) throw new Error("Could not get canvas context");

  ctx.drawImage(video, 0, 0, width, height);
  return ctx.getImageData(0, 0, width, height);
}
6. Change detection algorithm
6.1 Simple and effective V1

Compare current sampled frame against the last accepted key frame, not merely the immediately previous sample.

This prevents over-detecting long transitions.

function frameDifferenceRatio(
  a: ImageData,
  b: ImageData,
  pixelDeltaThreshold: number
): number {
  const dataA = a.data;
  const dataB = b.data;
  let changed = 0;
  const pixels = dataA.length / 4;

  for (let i = 0; i < dataA.length; i += 4) {
    const lumA =
      0.299 * dataA[i] +
      0.587 * dataA[i + 1] +
      0.114 * dataA[i + 2];

    const lumB =
      0.299 * dataB[i] +
      0.587 * dataB[i + 1] +
      0.114 * dataB[i + 2];

    if (Math.abs(lumA - lumB) > pixelDeltaThreshold) {
      changed++;
    }
  }

  return changed / pixels;
}

Acceptance logic:

function shouldAcceptCandidate(params: {
  diffRatio: number;
  currentTime: number;
  lastAcceptedTime: number | null;
  changedPixelRatioThreshold: number;
  minSecondsBetweenCaptures: number;
}) {
  const {
    diffRatio,
    currentTime,
    lastAcceptedTime,
    changedPixelRatioThreshold,
    minSecondsBetweenCaptures,
  } = params;

  if (diffRatio < changedPixelRatioThreshold) return false;

  if (
    lastAcceptedTime !== null &&
    currentTime - lastAcceptedTime < minSecondsBetweenCaptures
  ) {
    return false;
  }

  return true;
}
6.2 Better V1.5: histogram difference

Raw pixel comparison can be sensitive to motion, noise, flashes, compression artifacts, and camera pans.

A better approach is to compare low-resolution color/luma histograms.

Recommended hybrid score:

score = 0.7 × luminance histogram difference
      + 0.3 × pixel difference ratio

This tends to better detect scene/content changes instead of tiny motion.

6.3 Optional V2: perceptual hash

For more stable detection:

Generate a small grayscale frame, e.g. 32×32.
Compute average hash or difference hash.
Compare Hamming distance.
Accept when distance exceeds threshold.

This is good when videos have subtle compression differences but meaningful visual changes.

7. Scan workflow

Pseudo-code:

async function scanVideo(
  video: HTMLVideoElement,
  settings: ScanSettings,
  onProgress: (progress: number) => void
): Promise<CandidateFrame[]> {
  const analysisCanvas = document.createElement("canvas");

  const candidates: CandidateFrame[] = [];
  let lastAcceptedFrame: ImageData | null = null;
  let lastAcceptedTime: number | null = null;

  for (
    let time = 0;
    time < video.duration;
    time += settings.sampleIntervalSeconds
  ) {
    await seekVideo(video, time);

    const imageData = captureAnalysisFrame(
      video,
      analysisCanvas,
      settings.analysisWidth,
      settings.analysisHeight
    );

    if (!lastAcceptedFrame) {
      candidates.push({
        time,
        score: 1,
        reason: "initial-frame",
      });

      lastAcceptedFrame = imageData;
      lastAcceptedTime = time;
      continue;
    }

    const diffRatio = frameDifferenceRatio(
      lastAcceptedFrame,
      imageData,
      settings.pixelDeltaThreshold
    );

    if (
      shouldAcceptCandidate({
        diffRatio,
        currentTime: time,
        lastAcceptedTime,
        changedPixelRatioThreshold: settings.changedPixelRatioThreshold,
        minSecondsBetweenCaptures: settings.minSecondsBetweenCaptures,
      })
    ) {
      candidates.push({
        time,
        score: diffRatio,
        reason: "visual-change",
      });

      lastAcceptedFrame = imageData;
      lastAcceptedTime = time;
    }

    onProgress(time / video.duration);

    if (candidates.length >= settings.maxCandidates) {
      // optional: stop early or continue and later rank candidates
    }
  }

  return rankAndTrimCandidates(candidates, settings.finalTargetCount);
}

Candidate type:

type CandidateFrame = {
  time: number;
  score: number;
  reason: "initial-frame" | "visual-change" | "manual";
  thumbnailUrl?: string;
  fullBlob?: Blob;
};
8. Candidate ranking

Since the desired final count is only 4–10, do not simply take every threshold crossing.

Use ranking:

Always include first frame if useful.
Sort detected changes by score.
Enforce time diversity.
Keep the top N.

Example:

function rankAndTrimCandidates(
  candidates: CandidateFrame[],
  targetCount: number
): CandidateFrame[] {
  return [...candidates]
    .sort((a, b) => b.score - a.score)
    .slice(0, targetCount)
    .sort((a, b) => a.time - b.time);
}

Better version:

Divide video into N temporal buckets
Pick the strongest candidate from each bucket
Then fill remaining slots by highest score

This avoids ending up with all images from one active segment.

9. Full-resolution extraction

Only render full-resolution images for selected candidates.

async function extractFullResolutionFrame(
  video: HTMLVideoElement,
  time: number,
  type: "image/png" | "image/jpeg" | "image/webp" = "image/png",
  quality?: number
): Promise<Blob> {
  await seekVideo(video, time);

  const canvas = document.createElement("canvas");
  canvas.width = video.videoWidth;
  canvas.height = video.videoHeight;

  const ctx = canvas.getContext("2d");
  if (!ctx) throw new Error("Could not get canvas context");

  ctx.drawImage(video, 0, 0);

  return new Promise((resolve, reject) => {
    canvas.toBlob(
      blob => {
        if (!blob) reject(new Error("Canvas export failed"));
        else resolve(blob);
      },
      type,
      quality
    );
  });
}

For clipboard, prefer PNG:

async function copyImageBlobToClipboard(blob: Blob) {
  await navigator.clipboard.write([
    new ClipboardItem({
      [blob.type]: blob,
    }),
  ]);
}

Use:

const blob = await extractFullResolutionFrame(video, candidate.time, "image/png");
await copyImageBlobToClipboard(blob);

Clipboard image writing should be initiated by a user action, such as clicking a Copy button, and the app should be served from HTTPS or localhost because the Clipboard API is gated behind secure contexts.

10. UI design
Main layout
┌──────────────────────────────────────────────┐
│ Video Key Frame Extractor                    │
├──────────────────────────────────────────────┤
│ [Select video]  filename.mp4                 │
│ Duration: 04:32 | Size: 612 MB | 1920×1080   │
├──────────────────────────────────────────────┤
│ Scan Settings                                │
│ Sample interval: [0.5s]                      │
│ Change threshold: [18%]                      │
│ Min gap: [2.0s]                              │
│ Target images: [8]                           │
│ [Scan Video]                                 │
├──────────────────────────────────────────────┤
│ Progress bar                                 │
├──────────────────────────────────────────────┤
│ Candidate Gallery                            │
│ [thumb] 00:04.5 score .24 [Copy] [Download]  │
│ [thumb] 00:18.0 score .41 [Copy] [Download]  │
│ [thumb] 01:22.5 score .37 [Copy] [Download]  │
└──────────────────────────────────────────────┘
Controls

Minimum controls:

Select video
Scan
Stop scan
Reset
Threshold slider
Sample interval dropdown
Target count
Copy image
Download image

Advanced controls:

“Prefer fewer images”
“Prefer more images”
“Ignore tiny motion”
“Include first frame”
“Manually add current preview frame”
11. State model

Use Zustand or plain React reducer.

type AppState = {
  video: LoadedVideo | null;
  scanStatus: "idle" | "loading" | "scanning" | "done" | "error";
  settings: ScanSettings;
  candidates: CandidateFrame[];
  selectedCandidateId: string | null;
  progress: number;
  error: string | null;
};

Actions:

type AppActions = {
  loadVideo(file: File): Promise<void>;
  scanVideo(): Promise<void>;
  cancelScan(): void;
  updateSettings(settings: Partial<ScanSettings>): void;
  copyCandidate(candidateId: string): Promise<void>;
  downloadCandidate(candidateId: string): Promise<void>;
  removeCandidate(candidateId: string): void;
};
12. Worker design

For V1, the main thread can own the hidden video element because HTMLVideoElement is DOM-bound. But pixel comparison can be pushed into a worker.

Main thread:

seek video
draw frame to analysis canvas
get ImageData
send ImageData buffer to worker
worker returns diff score

Worker message:

type AnalyzeFrameMessage = {
  type: "analyze-frame";
  current: ImageData;
  previous: ImageData;
  pixelDeltaThreshold: number;
};

Worker response:

type AnalyzeFrameResult = {
  type: "frame-score";
  diffRatio: number;
};

For your target size, this may not be necessary at first. But it is worth structuring the code so the algorithm can move into a worker later.

13. Performance expectations

For a 10-minute video:

Sample interval: 0.5s
Comparisons: ~1200
Analysis resolution: 160×90
Pixels per comparison: 14,400
Total pixel comparisons: ~17.3 million

That is very reasonable in browser JavaScript.

For a 30-minute video:

Sample interval: 0.5s
Comparisons: ~3600
Analysis resolution: 160×90
Total pixel comparisons: ~51.8 million

Still reasonable. The bottleneck will usually be video seeking/decoding, not the pixel math.

14. File-size considerations

Your 100–1000 MB range is practical because:

The video is referenced via local file blob URL.
You only decode sampled frames.
You only store 4–10 output blobs.
Analysis frames are tiny.

Memory use should look roughly like:

Video decoder buffers: browser-managed
Analysis frame A: ~160 × 90 × 4 = 57.6 KB
Analysis frame B: ~57.6 KB
Full-res output image: only when accepted/copied
Candidate blobs: 4–10 images

At 1920×1080, one raw RGBA canvas frame is about:

1920 × 1080 × 4 = ~8.3 MB

At 3840×2160:

3840 × 2160 × 4 = ~33.2 MB

Still fine if you only create one full-res canvas at a time.

15. Browser compatibility
Best target

Start with Chromium-based browsers:

Chrome
Edge
Brave
Arc
Chromium

Reasons:

Strong media support.
Good Clipboard API support.
Good WebCodecs path for later.

Firefox should work for the V1 <video> + canvas path. WebCodecs VideoDecoder is listed by MDN as supported in Firefox 130+, while Chromium browsers have supported it since version 94.

Safari may work for simple video/canvas extraction, but clipboard image writing and codec behavior are areas to test carefully.

16. Security and privacy model

The app should explicitly state:

Your video is processed locally in your browser.
The file is not uploaded.
No server receives the video or extracted images.

Implementation requirements:

Do not send file contents to analytics.
Do not log object URLs remotely.
Avoid third-party scripts if privacy is part of the value prop.
Serve over HTTPS.
Make the app installable as a PWA if desired.
17. Error handling

Handle:

Case	Response
Unsupported video codec	Show “This browser cannot decode this video.”
Huge resolution	Offer lower-quality extraction or warn about memory.
Clipboard blocked	Offer download fallback.
Scan too slow	Let user increase sample interval.
Too many candidates	Raise threshold or bucket results.
No candidates found	Lower threshold or include first/middle/end frames.

Clipboard fallback:

try {
  await copyImageBlobToClipboard(blob);
} catch {
  downloadBlob(blob, "frame.png");
}
18. Suggested file structure
src/
  app/
    App.tsx
    store.ts
  components/
    FilePicker.tsx
    VideoMetadataPanel.tsx
    ScanSettingsPanel.tsx
    ProgressPanel.tsx
    CandidateGallery.tsx
    CandidateCard.tsx
  media/
    loadVideo.ts
    seekVideo.ts
    captureFrame.ts
    extractFullFrame.ts
    clipboard.ts
  analysis/
    frameDifference.ts
    histogramDifference.ts
    candidateRanking.ts
    scanVideo.ts
  workers/
    frameAnalysis.worker.ts
  types/
    media.ts
    scan.ts
19. MVP implementation plan
Phase 1: Core prototype

Build:

File picker
Hidden video loader
Metadata display
Scan button
Basic frame-difference detection
Candidate list with timestamps
Thumbnail generation

No worker yet.

Phase 2: Copy/export

Add:

Full-res frame extraction
Copy PNG to clipboard
Download PNG fallback
Remove candidate
Manual add candidate
Phase 3: Better detection

Add:

Histogram comparison
Temporal bucketing
Threshold presets
“Strict / Balanced / Sensitive” modes

Example presets:

const presets = {
  strict: {
    sampleIntervalSeconds: 1.0,
    changedPixelRatioThreshold: 0.28,
    minSecondsBetweenCaptures: 4,
  },
  balanced: {
    sampleIntervalSeconds: 0.5,
    changedPixelRatioThreshold: 0.18,
    minSecondsBetweenCaptures: 2,
  },
  sensitive: {
    sampleIntervalSeconds: 0.25,
    changedPixelRatioThreshold: 0.1,
    minSecondsBetweenCaptures: 1,
  },
};
Phase 4: Worker optimization

Move frame comparison into a worker.

Phase 5: Advanced media backend

Evaluate:

Mediabunny + WebCodecs

This is the path if you want better control over frame extraction, codecs, and container-level handling. Mediabunny is actively positioned as a modern browser media toolkit, and WebCodecs is the underlying native browser primitive for direct encoded/decoded media handling.

20. Final recommendation

For your app, build V1 with:

Vite + React + TypeScript
HTMLVideoElement
Canvas
Blob URLs
Clipboard API
Optional Zustand

Do not start with:

ffmpeg.wasm
OpenCV.js
Server-side processing

Best V1 strategy:

Sample every 0.5s
Analyze at 160×90
Compare against last accepted key frame
Use threshold + minimum time gap
Rank candidates down to 4–10
Extract full-res only when displaying/copying