GuidePipelineClipsReportsAgentsAPI & FAQ

Pipeline Reference

The VIDIM pipeline processes MXF broadcast files through 11 sequential stages, each powered by a dedicated model or engine. Processing a typical heat takes 5-6 minutes.

AI Models

Three models running on NVIDIA RTX 5080 (16GB VRAM). Total VRAM usage: ~5.1GB during active processing.

Qwen3-VL-8B~4.7 GB VRAMGGUF Q4_K_M
Vision-language model. Overlay reading fallback for ambiguous frames (~5%). Loaded on-demand, auto-unloads after 5 min idle.
CLIP ViT-B/32~0.36 GB VRAMPyTorch
Scene classification with trained 16-class head. Always loaded during pipeline. GPU-accelerated batch processing.
RapidOCR PP-OCRv4Shared CUDA VRAMONNX
Broadcast overlay text extraction. CUDA-accelerated via onnxruntime-gpu. Reads athlete names, times, scores.

11 Pipeline Stages

01frame_extraction
Extracts 1fps JPEG frames from MXF broadcast file. Typically produces ~3600 frames per hour of content. Uses ffmpeg subprocess for reliable MXF decoding.
02scene_classification
Classifies each frame into 16 scene types using a trained classification head on CLIP ViT-B/32 embeddings. Scene types include: start house, push start, active run, finish area, replay, standings, start list, results, and more.
03ocr
Reads all on-screen text from broadcast graphics overlays using RapidOCR (PP-OCRv4 on ONNX with CUDA acceleration). Extracts athlete names, country codes, split times, speed readouts, ranking positions, and event metadata.
04overlay_reading
Structures raw OCR text into typed data fields using rule-based GFX parsers. Handles ATHLETE_ID nameplates, STANDINGS tables, START_LIST, RESULTS, FINISH_TIME, SPLIT_TIMES. Falls back to Qwen3-VL-8B for ambiguous frames (~5%).
05roster_crossref
Fuzzy-matches extracted athlete names against the canonical roster for the event. Handles partial names, OCR errors, and name variations across 24 national federations.
06state_machine
Tracks broadcast state transitions: preshow, start_house, push_start, active_run, finish, replay, standings, commercial. Determines when each athlete run begins and ends.
07cut_policy
Determines optimal clip IN/OUT points from state transitions and GFX boundaries. Applies padding rules, minimum duration constraints, and handles edge cases like crashes and replays.
08clip_creation
Assembles final clip metadata: athlete identity, clip type, timecodes (TC in/out), frame ranges, duration, and associated OCR data.
09clip_scoring
Scores each clip 0.0–1.0 using a 6-factor model: completeness, name confidence, boundary quality, drama value, visual quality, and data richness.
10export
Writes the clip catalog, generates thumbnails for IN/OUT frames, and exports selected clips. Produces JSON metadata and optional MXF clip files.
11report_generation
Generates a comprehensive JSON report with MXF metadata, pipeline timing, clip statistics, athlete roster, full clip catalog, GFX breakdown, and detected issues.
Abort with the stop button or POST /api/scan/abort. Partial results are preserved.

Pipeline UI Preview

This is what the pipeline control interface looks like during an active scan. The status bar shows model state, VRAM usage, and IRC connection.

VIDIM v2.1|localhost:8000
MODELQwen3-VL
VRAM4.7 GB
IRCCONNECTED
PIPELINERUNNING
MXF FILES (1/1 selected)
x2526_WC7_AL_D_4M_H1.mxf2.1 GB
File selector panel. Check files to include in scan.
runningETA: 3m 12s
Launches 11-stage analysis
Status indicators show live model, VRAM, IRC connection, and pipeline state
PIPELINE STATUS
ocrFrame 125 / 530
36%
1
2
3
4
5
6
7
8
9
10
11
frames
scene
ocr
overlay
roster
state
cut
clip
score
export
report
Green = completed stages
Cyan pulse = current active stage
Gray = pending stages

Performance

Typical Processing Time
1-hour broadcast: 5-12 min total
Frame extraction: ~45s (3600 frames)
OCR: ~3 min (primary bottleneck, ~57%)
Output: 30-50 clips, 15-20 athlete runs
Discipline Codes
4M = 4-Man Bobsleigh2M = 2-Man BobsleighWS = Women's SkeletonMS = Men's SkeletonWB = Women's BobsleighMONO = Monobob