docs: add design document for local video key-frame extractor

2026-05-14 00:21:32 -07:00
parent 2ea3fdee30
commit 2e704349f7
1 changed files with 929 additions and 0 deletions
@@ -0,0 +1,929 @@
 Recommended approach
 Use:
 <input type="file">
 → URL.createObjectURL(file)
 → hidden <video>
 → sample frames periodically
 → draw sampled frame to canvas
 → compare against previous accepted frame
 → keep only significant-change frames
 → user clicks Copy on selected image
 → navigator.clipboard.write()
 URL.createObjectURL(file) gives the browser a local blob URL for the selected file, and you should release it with URL.revokeObjectURL() when done.
 Do not analyze every frame at full resolution
 For change detection, use a small analysis canvas:
 video frame → draw to 160×90 or 320×180 canvas → compare pixels
 Then only render the full-resolution frame when you decide it is worth keeping.
 Example strategy:
 const ANALYSIS_W = 160;
 const ANALYSIS_H = 90;
 const SAMPLE_INTERVAL = 0.5; // seconds
 const CHANGE_THRESHOLD = 0.18; // tune this
 That means a 10-minute video sampled every 0.5 seconds is only:
 10 min × 60 / 0.5 = 1200 comparisons
 At 160×90, that is cheap.
 Simple frame-difference algorithm
 Use a downscaled canvas and compare luminance differences:
 function frameDifference(a, b) {
  let changed = 0;
  const pixels = a.data.length / 4;
  for (let i = 0; i < a.data.length; i += 4) {
    const lumA = 0.299 * a.data[i] + 0.587 * a.data[i + 1] + 0.114 * a.data[i + 2];
    const lumB = 0.299 * b.data[i] + 0.587 * b.data[i + 1] + 0.114 * b.data[i + 2];
    if (Math.abs(lumA - lumB) > 32) changed++;
  }
  return changed / pixels;
 }
 Then:
 if (diff > CHANGE_THRESHOLD) {
  // save this timestamp as a candidate key frame
 }
 Better heuristic
 Instead of comparing every sampled frame to the immediately previous sample, compare it to the last accepted key frame.
 That avoids saving several frames during the same transition.
 Frame A accepted
 Frame B slightly different → reject
 Frame C slightly different → reject
 Frame D substantially different from A → accept
 Also add a minimum time gap:
 const MIN_SECONDS_BETWEEN_ACCEPTED = 2.0;
 Clipboard copying
 Copying images to the clipboard is possible with navigator.clipboard.write() and ClipboardItem. Browsers commonly support PNG image data, but clipboard access requires a secure context such as HTTPS or localhost, and browser/user activation rules apply.
 Example:
 async function copyCanvasToClipboard(canvas) {
  const blob = await new Promise(resolve =>
    canvas.toBlob(resolve, "image/png")
  );
  await navigator.clipboard.write([
    new ClipboardItem({
      "image/png": blob
    })
  ]);
 }
 Important: this should happen from a user gesture, like clicking a Copy button.
 Memory profile
 This is light if implemented correctly:
 1 local video blob URL
 1 hidden video element
 1 small analysis canvas
 1 full-res preview canvas/image
 4–10 saved output blobs
 Avoid:
 Storing hundreds/thousands of full-res canvases
 Storing every sampled frame
 Using FileReader.readAsDataURL(video)
 Exporting a huge ZIP unless needed
 For your target, keeping 4–10 PNG/WebP blobs in memory is fine.
 Practical UX flow
 I would build it like this:
 User selects video.
 App scans with adjustable settings:
 sample interval
 change threshold
 minimum seconds between captures
 max captures
 App shows candidate thumbnails.
 User deletes bad picks if needed.
 Each thumbnail has:
 Copy PNG
 Download
 timestamp
 Optional: button to rescan with stricter/looser threshold.
 Caveats
 Plain <video> seeking may not be perfectly frame-accurate.
 Sampling every 0.25–1.0s is much more practical than decoding every frame.
 Browser codec support matters. MP4/H.264 is safest.
 Safari/Firefox clipboard image support can be more limited than Chromium-based browsers.
 For exact frame extraction, use WebCodecs or ffmpeg.wasm, but for your “significant scene change” use case, <video> + canvas is probably enough.
 My recommended V1 stack
 React/Vite
 HTMLVideoElement
 Canvas API
 Blob/Object URLs
 navigator.clipboard.write()
 Optional: localForage or IndexedDB if you want session persistence
 No backend required. No upload required.
 What would be the best framework or library to handle these tasks? Come up with a detailed implementation / design document. 
 I’ll treat this as a local-only browser app design, optimized for your 100–1000 MB input range and 4–10 final images. I’ll ground the library/API choices against current browser support rather than assuming older constraints.
 Thought for 5s
 Recommendation
 Use React + Vite + TypeScript for the app shell, but do the actual video work with native browser APIs first:
 Vite + React + TypeScript
 HTMLVideoElement
 Canvas / OffscreenCanvas
 Web Worker
 Blob URLs
 Clipboard API
 For your use case, I would not start with ffmpeg.wasm. It is powerful, but it is overkill for scene/key-frame extraction and can become memory-heavy with 100–1000 MB files. ffmpeg.wasm is a browser/WebAssembly port of FFmpeg, but many real-world reports and project issues revolve around large-file memory limits and browser stability.
 For a stronger V2, consider Mediabunny + WebCodecs. Mediabunny is a TypeScript media toolkit designed for browser-side reading/writing/conversion, and WebCodecs exposes native VideoFrame, EncodedVideoChunk, VideoDecoder, and VideoEncoder primitives.
 Design Document: Local Video Key-Frame Extractor
 1. Goal
 Build a browser-only app that allows a user to:
 Select a local video file.
 Analyze it without uploading to a server.
 Detect frames where the image content has significantly changed.
 Present a small set of candidate key images.
 Let the user copy selected images to the clipboard.
 Target input:
 Video size: 100–1000 MB
 Expected output: 4–10 still images
 Processing model: local-only
 Backend: none required
 2. Recommended architecture
 User selects video
    ↓
 Create local Blob URL
    ↓
 Hidden <video> element loads local file
    ↓
 Sampling engine seeks through video
    ↓
 Downscaled analysis canvas compares frames
    ↓
 Candidate detector stores timestamps
    ↓
 Full-resolution renderer extracts selected frames
    ↓
 Gallery shows thumbnails
    ↓
 User copies PNG to clipboard
 Core browser APIs:
 Task	API
 Local file selection	File API
 Local video loading	URL.createObjectURL()
 Video decoding/seeking	HTMLVideoElement
 Pixel comparison	Canvas API
 Off-main-thread analysis	Worker + OffscreenCanvas where available
 Image export	canvas.toBlob()
 Clipboard copy	navigator.clipboard.write() + ClipboardItem
 URL.createObjectURL() is appropriate here because it creates a local blob URL for a File/Blob, avoiding the need to load the entire file into JavaScript memory. The Clipboard API supports writing non-text data through ClipboardItem, but it requires a secure context such as HTTPS or localhost.
 3. Framework choice
 Best V1 stack
 Vite
 React
 TypeScript
 Zustand or simple React state
 Canvas API
 Web Worker
 Why:
 Vite: fast local dev, easy static deployment.
 React: good for gallery, controls, progress UI, and stateful review flow.
 TypeScript: useful because media processing code quickly accumulates edge cases.
 No backend: app can deploy as static files.
 No heavy media dependency initially: browser decoding is enough for threshold-based change detection.
 Suggested install:
 npm create vite@latest video-keyframe-extractor -- --template react-ts
 cd video-keyframe-extractor
 npm install
 npm run dev
 Optional dependencies:
 npm install zustand
 I would avoid extra UI frameworks unless you want polished controls quickly.
 4. Library/API decision matrix
 Option	Use for V1?	Notes
 Native <video> + canvas	Yes	Best starting point. Simple, no server, no large WASM payload.
 Web Worker	Yes	Keeps UI responsive during scans.
 OffscreenCanvas	Yes, optional	Good for worker-side pixel analysis where supported.
 WebCodecs	V2	Better low-level control, but more complex. WebCodecs is browser-native and provides direct access to decoded frames.
 Mediabunny	V2 / advanced	Good candidate for more robust container/media handling. It advertises efficient browser-side reading and only loading what is needed.
 ffmpeg.wasm	Avoid for V1	Heavy and memory-risky for 100–1000 MB videos. Better for fallback/export/transcoding, not first-pass scene detection.
 OpenCV.js	Probably avoid	Useful for advanced CV, but heavy for simple scene-difference detection.
 5. App modules
 5.1 File intake module
 Responsibilities:
 Accept local video file.
 Validate file type and size.
 Create blob URL.
 Load video metadata.
 Display duration, resolution, codec/browser compatibility status if possible.
 Example state:
 type LoadedVideo = {
  file: File;
  objectUrl: string;
  duration: number;
  width: number;
  height: number;
 };
 Important cleanup:
 URL.revokeObjectURL(objectUrl);
 Never use this for the video:
 FileReader.readAsDataURL(file);
 That would load and base64-expand the whole video in memory.
 5.2 Scan settings module
 Expose these controls:
 type ScanSettings = {
  sampleIntervalSeconds: number;       // default: 0.5
  analysisWidth: number;               // default: 160
  analysisHeight: number;              // default: 90
  pixelDeltaThreshold: number;         // default: 32
  changedPixelRatioThreshold: number;  // default: 0.18
  minSecondsBetweenCaptures: number;   // default: 2.0
  maxCandidates: number;               // default: 20
  finalTargetCount: number;            // default: 8
 };
 Suggested defaults:
 const defaultSettings: ScanSettings = {
  sampleIntervalSeconds: 0.5,
  analysisWidth: 160,
  analysisHeight: 90,
  pixelDeltaThreshold: 32,
  changedPixelRatioThreshold: 0.18,
  minSecondsBetweenCaptures: 2.0,
  maxCandidates: 20,
  finalTargetCount: 8,
 };
 5.3 Frame sampling module
 V1 should seek using a hidden video element:
 async function seekVideo(video: HTMLVideoElement, time: number): Promise<void> {
  return new Promise((resolve, reject) => {
    const onSeeked = () => {
      cleanup();
      resolve();
    };
    const onError = () => {
      cleanup();
      reject(video.error);
    };
    const cleanup = () => {
      video.removeEventListener("seeked", onSeeked);
      video.removeEventListener("error", onError);
    };
    video.addEventListener("seeked", onSeeked, { once: true });
    video.addEventListener("error", onError, { once: true });
    video.currentTime = Math.min(time, video.duration);
  });
 }
 Then draw into a small analysis canvas:
 function captureAnalysisFrame(
  video: HTMLVideoElement,
  canvas: HTMLCanvasElement,
  width: number,
  height: number
 ): ImageData {
  canvas.width = width;
  canvas.height = height;
  const ctx = canvas.getContext("2d", { willReadFrequently: true });
  if (!ctx) throw new Error("Could not get canvas context");
  ctx.drawImage(video, 0, 0, width, height);
  return ctx.getImageData(0, 0, width, height);
 }
 6. Change detection algorithm
 6.1 Simple and effective V1
 Compare current sampled frame against the last accepted key frame, not merely the immediately previous sample.
 This prevents over-detecting long transitions.
 function frameDifferenceRatio(
  a: ImageData,
  b: ImageData,
  pixelDeltaThreshold: number
 ): number {
  const dataA = a.data;
  const dataB = b.data;
  let changed = 0;
  const pixels = dataA.length / 4;
  for (let i = 0; i < dataA.length; i += 4) {
    const lumA =
      0.299 * dataA[i] +
      0.587 * dataA[i + 1] +
      0.114 * dataA[i + 2];
    const lumB =
      0.299 * dataB[i] +
      0.587 * dataB[i + 1] +
      0.114 * dataB[i + 2];
    if (Math.abs(lumA - lumB) > pixelDeltaThreshold) {
      changed++;
    }
  }
  return changed / pixels;
 }
 Acceptance logic:
 function shouldAcceptCandidate(params: {
  diffRatio: number;
  currentTime: number;
  lastAcceptedTime: number | null;
  changedPixelRatioThreshold: number;
  minSecondsBetweenCaptures: number;
 }) {
  const {
    diffRatio,
    currentTime,
    lastAcceptedTime,
    changedPixelRatioThreshold,
    minSecondsBetweenCaptures,
  } = params;
  if (diffRatio < changedPixelRatioThreshold) return false;
  if (
    lastAcceptedTime !== null &&
    currentTime - lastAcceptedTime < minSecondsBetweenCaptures
  ) {
    return false;
  }
  return true;
 }
 6.2 Better V1.5: histogram difference
 Raw pixel comparison can be sensitive to motion, noise, flashes, compression artifacts, and camera pans.
 A better approach is to compare low-resolution color/luma histograms.
 Recommended hybrid score:
 score = 0.7 × luminance histogram difference
      + 0.3 × pixel difference ratio
 This tends to better detect scene/content changes instead of tiny motion.
 6.3 Optional V2: perceptual hash
 For more stable detection:
 Generate a small grayscale frame, e.g. 32×32.
 Compute average hash or difference hash.
 Compare Hamming distance.
 Accept when distance exceeds threshold.
 This is good when videos have subtle compression differences but meaningful visual changes.
 7. Scan workflow
 Pseudo-code:
 async function scanVideo(
  video: HTMLVideoElement,
  settings: ScanSettings,
  onProgress: (progress: number) => void
 ): Promise<CandidateFrame[]> {
  const analysisCanvas = document.createElement("canvas");
  const candidates: CandidateFrame[] = [];
  let lastAcceptedFrame: ImageData | null = null;
  let lastAcceptedTime: number | null = null;
  for (
    let time = 0;
    time < video.duration;
    time += settings.sampleIntervalSeconds
  ) {
    await seekVideo(video, time);
    const imageData = captureAnalysisFrame(
      video,
      analysisCanvas,
      settings.analysisWidth,
      settings.analysisHeight
    );
    if (!lastAcceptedFrame) {
      candidates.push({
        time,
        score: 1,
        reason: "initial-frame",
      });
      lastAcceptedFrame = imageData;
      lastAcceptedTime = time;
      continue;
    }
    const diffRatio = frameDifferenceRatio(
      lastAcceptedFrame,
      imageData,
      settings.pixelDeltaThreshold
    );
    if (
      shouldAcceptCandidate({
        diffRatio,
        currentTime: time,
        lastAcceptedTime,
        changedPixelRatioThreshold: settings.changedPixelRatioThreshold,
        minSecondsBetweenCaptures: settings.minSecondsBetweenCaptures,
      })
    ) {
      candidates.push({
        time,
        score: diffRatio,
        reason: "visual-change",
      });
      lastAcceptedFrame = imageData;
      lastAcceptedTime = time;
    }
    onProgress(time / video.duration);
    if (candidates.length >= settings.maxCandidates) {
      // optional: stop early or continue and later rank candidates
    }
  }
  return rankAndTrimCandidates(candidates, settings.finalTargetCount);
 }
 Candidate type:
 type CandidateFrame = {
  time: number;
  score: number;
  reason: "initial-frame" | "visual-change" | "manual";
  thumbnailUrl?: string;
  fullBlob?: Blob;
 };
 8. Candidate ranking
 Since the desired final count is only 4–10, do not simply take every threshold crossing.
 Use ranking:
 Always include first frame if useful.
 Sort detected changes by score.
 Enforce time diversity.
 Keep the top N.
 Example:
 function rankAndTrimCandidates(
  candidates: CandidateFrame[],
  targetCount: number
 ): CandidateFrame[] {
  return [...candidates]
    .sort((a, b) => b.score - a.score)
    .slice(0, targetCount)
    .sort((a, b) => a.time - b.time);
 }
 Better version:
 Divide video into N temporal buckets
 Pick the strongest candidate from each bucket
 Then fill remaining slots by highest score
 This avoids ending up with all images from one active segment.
 9. Full-resolution extraction
 Only render full-resolution images for selected candidates.
 async function extractFullResolutionFrame(
  video: HTMLVideoElement,
  time: number,
  type: "image/png" | "image/jpeg" | "image/webp" = "image/png",
  quality?: number
 ): Promise<Blob> {
  await seekVideo(video, time);
  const canvas = document.createElement("canvas");
  canvas.width = video.videoWidth;
  canvas.height = video.videoHeight;
  const ctx = canvas.getContext("2d");
  if (!ctx) throw new Error("Could not get canvas context");
  ctx.drawImage(video, 0, 0);
  return new Promise((resolve, reject) => {
    canvas.toBlob(
      blob => {
        if (!blob) reject(new Error("Canvas export failed"));
        else resolve(blob);
      },
      type,
      quality
    );
  });
 }
 For clipboard, prefer PNG:
 async function copyImageBlobToClipboard(blob: Blob) {
  await navigator.clipboard.write([
    new ClipboardItem({
      [blob.type]: blob,
    }),
  ]);
 }
 Use:
 const blob = await extractFullResolutionFrame(video, candidate.time, "image/png");
 await copyImageBlobToClipboard(blob);
 Clipboard image writing should be initiated by a user action, such as clicking a Copy button, and the app should be served from HTTPS or localhost because the Clipboard API is gated behind secure contexts.
 10. UI design
 Main layout
 ┌──────────────────────────────────────────────┐
 │ Video Key Frame Extractor                    │
 ├──────────────────────────────────────────────┤
 │ [Select video]  filename.mp4                 │
 │ Duration: 04:32 | Size: 612 MB | 1920×1080   │
 ├──────────────────────────────────────────────┤
 │ Scan Settings                                │
 │ Sample interval: [0.5s]                      │
 │ Change threshold: [18%]                      │
 │ Min gap: [2.0s]                              │
 │ Target images: [8]                           │
 │ [Scan Video]                                 │
 ├──────────────────────────────────────────────┤
 │ Progress bar                                 │
 ├──────────────────────────────────────────────┤
 │ Candidate Gallery                            │
 │ [thumb] 00:04.5 score .24 [Copy] [Download]  │
 │ [thumb] 00:18.0 score .41 [Copy] [Download]  │
 │ [thumb] 01:22.5 score .37 [Copy] [Download]  │
 └──────────────────────────────────────────────┘
 Controls
 Minimum controls:
 Select video
 Scan
 Stop scan
 Reset
 Threshold slider
 Sample interval dropdown
 Target count
 Copy image
 Download image
 Advanced controls:
 “Prefer fewer images”
 “Prefer more images”
 “Ignore tiny motion”
 “Include first frame”
 “Manually add current preview frame”
 11. State model
 Use Zustand or plain React reducer.
 type AppState = {
  video: LoadedVideo | null;
  scanStatus: "idle" | "loading" | "scanning" | "done" | "error";
  settings: ScanSettings;
  candidates: CandidateFrame[];
  selectedCandidateId: string | null;
  progress: number;
  error: string | null;
 };
 Actions:
 type AppActions = {
  loadVideo(file: File): Promise<void>;
  scanVideo(): Promise<void>;
  cancelScan(): void;
  updateSettings(settings: Partial<ScanSettings>): void;
  copyCandidate(candidateId: string): Promise<void>;
  downloadCandidate(candidateId: string): Promise<void>;
  removeCandidate(candidateId: string): void;
 };
 12. Worker design
 For V1, the main thread can own the hidden video element because HTMLVideoElement is DOM-bound. But pixel comparison can be pushed into a worker.
 Main thread:
 seek video
 draw frame to analysis canvas
 get ImageData
 send ImageData buffer to worker
 worker returns diff score
 Worker message:
 type AnalyzeFrameMessage = {
  type: "analyze-frame";
  current: ImageData;
  previous: ImageData;
  pixelDeltaThreshold: number;
 };
 Worker response:
 type AnalyzeFrameResult = {
  type: "frame-score";
  diffRatio: number;
 };
 For your target size, this may not be necessary at first. But it is worth structuring the code so the algorithm can move into a worker later.
 13. Performance expectations
 For a 10-minute video:
 Sample interval: 0.5s
 Comparisons: ~1200
 Analysis resolution: 160×90
 Pixels per comparison: 14,400
 Total pixel comparisons: ~17.3 million
 That is very reasonable in browser JavaScript.
 For a 30-minute video:
 Sample interval: 0.5s
 Comparisons: ~3600
 Analysis resolution: 160×90
 Total pixel comparisons: ~51.8 million
 Still reasonable. The bottleneck will usually be video seeking/decoding, not the pixel math.
 14. File-size considerations
 Your 100–1000 MB range is practical because:
 The video is referenced via local file blob URL.
 You only decode sampled frames.
 You only store 4–10 output blobs.
 Analysis frames are tiny.
 Memory use should look roughly like:
 Video decoder buffers: browser-managed
 Analysis frame A: ~160 × 90 × 4 = 57.6 KB
 Analysis frame B: ~57.6 KB
 Full-res output image: only when accepted/copied
 Candidate blobs: 4–10 images
 At 1920×1080, one raw RGBA canvas frame is about:
 1920 × 1080 × 4 = ~8.3 MB
 At 3840×2160:
 3840 × 2160 × 4 = ~33.2 MB
 Still fine if you only create one full-res canvas at a time.
 15. Browser compatibility
 Best target
 Start with Chromium-based browsers:
 Chrome
 Edge
 Brave
 Arc
 Chromium
 Reasons:
 Strong media support.
 Good Clipboard API support.
 Good WebCodecs path for later.
 Firefox should work for the V1 <video> + canvas path. WebCodecs VideoDecoder is listed by MDN as supported in Firefox 130+, while Chromium browsers have supported it since version 94.
 Safari may work for simple video/canvas extraction, but clipboard image writing and codec behavior are areas to test carefully.
 16. Security and privacy model
 The app should explicitly state:
 Your video is processed locally in your browser.
 The file is not uploaded.
 No server receives the video or extracted images.
 Implementation requirements:
 Do not send file contents to analytics.
 Do not log object URLs remotely.
 Avoid third-party scripts if privacy is part of the value prop.
 Serve over HTTPS.
 Make the app installable as a PWA if desired.
 17. Error handling
 Handle:
 Case	Response
 Unsupported video codec	Show “This browser cannot decode this video.”
 Huge resolution	Offer lower-quality extraction or warn about memory.
 Clipboard blocked	Offer download fallback.
 Scan too slow	Let user increase sample interval.
 Too many candidates	Raise threshold or bucket results.
 No candidates found	Lower threshold or include first/middle/end frames.
 Clipboard fallback:
 try {
  await copyImageBlobToClipboard(blob);
 } catch {
  downloadBlob(blob, "frame.png");
 }
 18. Suggested file structure
 src/
  app/
    App.tsx
    store.ts
  components/
    FilePicker.tsx
    VideoMetadataPanel.tsx
    ScanSettingsPanel.tsx
    ProgressPanel.tsx
    CandidateGallery.tsx
    CandidateCard.tsx
  media/
    loadVideo.ts
    seekVideo.ts
    captureFrame.ts
    extractFullFrame.ts
    clipboard.ts
  analysis/
    frameDifference.ts
    histogramDifference.ts
    candidateRanking.ts
    scanVideo.ts
  workers/
    frameAnalysis.worker.ts
  types/
    media.ts
    scan.ts
 19. MVP implementation plan
 Phase 1: Core prototype
 Build:
 File picker
 Hidden video loader
 Metadata display
 Scan button
 Basic frame-difference detection
 Candidate list with timestamps
 Thumbnail generation
 No worker yet.
 Phase 2: Copy/export
 Add:
 Full-res frame extraction
 Copy PNG to clipboard
 Download PNG fallback
 Remove candidate
 Manual add candidate
 Phase 3: Better detection
 Add:
 Histogram comparison
 Temporal bucketing
 Threshold presets
 “Strict / Balanced / Sensitive” modes
 Example presets:
 const presets = {
  strict: {
    sampleIntervalSeconds: 1.0,
    changedPixelRatioThreshold: 0.28,
    minSecondsBetweenCaptures: 4,
  },
  balanced: {
    sampleIntervalSeconds: 0.5,
    changedPixelRatioThreshold: 0.18,
    minSecondsBetweenCaptures: 2,
  },
  sensitive: {
    sampleIntervalSeconds: 0.25,
    changedPixelRatioThreshold: 0.1,
    minSecondsBetweenCaptures: 1,
  },
 };
 Phase 4: Worker optimization
 Move frame comparison into a worker.
 Phase 5: Advanced media backend
 Evaluate:
 Mediabunny + WebCodecs
 This is the path if you want better control over frame extraction, codecs, and container-level handling. Mediabunny is actively positioned as a modern browser media toolkit, and WebCodecs is the underlying native browser primitive for direct encoded/decoded media handling.
 20. Final recommendation
 For your app, build V1 with:
 Vite + React + TypeScript
 HTMLVideoElement
 Canvas
 Blob URLs
 Clipboard API
 Optional Zustand
 Do not start with:
 ffmpeg.wasm
 OpenCV.js
 Server-side processing
 Best V1 strategy:
 Sample every 0.5s
 Analyze at 160×90
 Compare against last accepted key frame
 Use threshold + minimum time gap
 Rank candidates down to 4–10
 Extract full-res only when displaying/copying