Pipeline Reference

The VIDIM pipeline processes MXF broadcast files through 11 sequential stages, each powered by a dedicated model or engine. Processing a typical heat takes 5-6 minutes.

AI Models

Three models running on NVIDIA RTX 5080 (16GB VRAM). Total VRAM usage: ~5.1GB during active processing.

Qwen3-VL-8B~4.7 GB VRAMGGUF Q4_K_M

Vision-language model. Overlay reading fallback for ambiguous frames (~5%). Loaded on-demand, auto-unloads after 5 min idle.

CLIP ViT-B/32~0.36 GB VRAMPyTorch

Scene classification with trained 16-class head. Always loaded during pipeline. GPU-accelerated batch processing.

RapidOCR PP-OCRv4Shared CUDA VRAMONNX

Broadcast overlay text extraction. CUDA-accelerated via onnxruntime-gpu. Reads athlete names, times, scores.

11 Pipeline Stages

01frame_extractionffmpeg

Extracts 1fps JPEG frames from MXF broadcast file. Typically produces ~3600 frames per hour of content. Uses ffmpeg subprocess for reliable MXF decoding.

02scene_classificationCLIP ViT-B/32

Classifies each frame into 16 scene types using a trained classification head on CLIP ViT-B/32 embeddings. Scene types include: start house, push start, active run, finish area, replay, standings, start list, results, and more.

03ocrRapidOCR (ONNX)

Reads all on-screen text from broadcast graphics overlays using RapidOCR (PP-OCRv4 on ONNX with CUDA acceleration). Extracts athlete names, country codes, split times, speed readouts, ranking positions, and event metadata.

04overlay_readingRule-based + VLM

Structures raw OCR text into typed data fields using rule-based GFX parsers. Handles ATHLETE_ID nameplates, STANDINGS tables, START_LIST, RESULTS, FINISH_TIME, SPLIT_TIMES. Falls back to Qwen3-VL-8B for ambiguous frames (~5%).

05roster_crossrefFuzzy match

Fuzzy-matches extracted athlete names against the canonical roster for the event. Handles partial names, OCR errors, and name variations across 24 national federations.

06state_machineState machine

Tracks broadcast state transitions: preshow, start_house, push_start, active_run, finish, replay, standings, commercial. Determines when each athlete run begins and ends.

07cut_policyRule-based

Determines optimal clip IN/OUT points from state transitions and GFX boundaries. Applies padding rules, minimum duration constraints, and handles edge cases like crashes and replays.

08clip_creation—

Assembles final clip metadata: athlete identity, clip type, timecodes (TC in/out), frame ranges, duration, and associated OCR data.

09clip_scoringScoring engine

Scores each clip 0.0–1.0 using a 6-factor model: completeness, name confidence, boundary quality, drama value, visual quality, and data richness.

10export—

Writes the clip catalog, generates thumbnails for IN/OUT frames, and exports selected clips. Produces JSON metadata and optional MXF clip files.

11report_generation—

Generates a comprehensive JSON report with MXF metadata, pipeline timing, clip statistics, athlete roster, full clip catalog, GFX breakdown, and detected issues.

Abort with the stop button or POST /api/scan/abort. Partial results are preserved.

Pipeline UI Preview

This is what the pipeline control interface looks like during an active scan. The status bar shows model state, VRAM usage, and IRC connection.

VIDIM v2.1|localhost:8000

MODELQwen3-VL

VRAM4.7 GB

IRCCONNECTED

PIPELINERUNNING

MXF FILES (1/1 selected)

x2526_WC7_AL_D_4M_H1.mxf2.1 GB

File selector panel. Check files to include in scan.

runningETA: 3m 12s

Launches 11-stage analysis

Status indicators show live model, VRAM, IRC connection, and pipeline state

PIPELINE STATUS

ocrFrame 125 / 530

36%

frames

scene

ocr

overlay

roster

state

cut

clip

score

export

report

Green = completed stages

Cyan pulse = current active stage

Gray = pending stages

Performance

Typical Processing Time

1-hour broadcast: 5-12 min total

Frame extraction: ~45s (3600 frames)

OCR: ~3 min (primary bottleneck, ~57%)

Output: 30-50 clips, 15-20 athlete runs

Discipline Codes

4M = 4-Man Bobsleigh2M = 2-Man BobsleighWS = Women's SkeletonMS = Men's SkeletonWB = Women's BobsleighMONO = Monobob

Open Pipeline Control →Clip Types & Browser →

← Back to Docs