Skip to main content
Use this tutorial when you want a repeatable video-understanding workflow rather than a one-off multimodal prompt.

Best fit

  • large sets of frames
  • scene tagging
  • factual captioning
  • video metadata enrichment
  • model: ClipTagger
  • small related bundles: background jobs or group jobs
  • large queues: batch

Workflow

  1. decide the frame sampling strategy
  2. choose whether the job is small-batch or large offline
  3. run frames through ClipTagger
  4. aggregate frame outputs into your higher-level video result