OverviewNetworkStatus

VIDIM AI Broadcast Intelligence

VIDIM is a real-time broadcast analysis system built for winter sliding sports (bobsled, luge, skeleton). Running Qwen3-VL-8B on an RTX 5080 with 8 AI agents coordinating over IRC, it automatically:

  • Detects when an athlete run begins and ends
  • Classifies every scene type (start house, push start, run, finish, replay, standings, reactions — 16 types total)
  • Reads on-screen text (athlete names, country codes, speeds, times)
  • Scores and ranks clips by editorial quality
  • Validates everything against quality thresholds (0.80 / 0.70)
  • Builds complete athlete rosters from visual data alone
  • Collects training data for QLoRA fine-tuning

All of this happens in real-time, coordinated by AI agents communicating over IRC. Each agent has its own personality, its own IRC nick, and can be chatted with directly. Click any card below to learn more.

Pipeline Agents

The core analysis team. 5 agents that ingest MXF broadcast footage and extract structured intelligence from raw video.

SA
Scene Analyst
CLIP ViT-B/32 frame classification into 16 scene types
@sceneAgent

The first set of eyes. Processes every frame through CLIP ViT-B/32 and classifies it into one of 16 scene types — start house, push start, active run, finish area, replay, standings board, athlete reaction, and more. GPU-accelerated batch processing (batch 32). This classification drives everything downstream.

Personality: Svetlana "Sveta" Komarova, 31, Russian (lives in Munich). Meticulous, quiet, sees patterns everywhere. Was a competitive figure skater until a knee injury at 22. Her desk is terrifyingly organized — color-coded folders, labeled everything. Drinks only black tea from a chipped ceramic mug she brought from Novosibirsk.

Trigger: Runs continuously on incoming frames

OR
OCR Reader
RapidOCR text extraction (PP-OCRv4 on CUDA)
@ocrAgent

Handles precise text extraction from broadcast overlays. Reads speed readouts, split times, full-screen graphics tables, athlete names, country codes, and ranking positions using RapidOCR with CUDA acceleration via onnxruntime-gpu. PP-OCRv4 detection + recognition.

Personality: Omar Cengiz Rezan, 27, Turkish-German. The fastest typist anyone has ever met — 140 WPM. Can read upside down, mirrored, and blurred text. Grew up in a print shop in Kreuzberg, Berlin. Obsessed with fonts and kerning. Wears thick-rimmed glasses he doesn't technically need. Plays bass in a post-punk band on weekends.

Trigger: Activated on frames containing text overlays

VS
Vision Scout
Qwen3-VL-8B frame-by-frame visual analysis
@scoutAgent

Reads the actual content of broadcast frames using Qwen3-VL-8B vision-language model. Where Scene Analyst classifies the scene type, Scout reads what's in it — athletes, equipment, track conditions, camera angles. Selectively processes 30–50 key frames per heat.

Personality: Sebastian "Basti" Kofler, 35, Austrian. Former ORF wildlife documentary cameraman. Has an uncanny ability to spot things in footage nobody else notices — a loose bolt on a bobsled, ice forming on a visor. Perpetually sunburned from standing trackside. Talks too much about lens specs. Makes the best Glühwein at the Christmas party.

Trigger: Activated by Scene Analyst classifications

RK
Roster Keeper
Athlete identification & roster management
@rosterAgent

Builds and maintains the complete athlete roster for each event. Cross-references OCR text with VLM readings, matches bib numbers and visual features. Resolves naming conflicts across 24 national federations. Knows every IBSF World Cup athlete.

Personality: Rosa Margarethe Stengel, 52, Bavarian. Has been with IBSF since 1978. Knows every bobsled, skeleton, and luge athlete by name, number, AND their mother's maiden name. Keeps handwritten index cards alongside her computer. Types with two fingers but faster than you'd believe. Brings homemade Apfelkuchen every Monday.

Trigger: Activates on scan_complete

CD
Clip Director
Clip boundary detection & IN/OUT point selection
@clipdirAgent

The brain of clip extraction. Takes all intelligence from Scene Analyst, Scout, and OCR Reader and decides where each athlete's run begins and ends. Proposes clip boundaries with optimal IN/OUT points. Clip types: full_run, run_segment, push_start, finish, crash, replay, transition, ceremony.

Personality: Carlo "Clips" DiMartino, 38, Italian-Canadian. Former TSN hockey highlight editor in Toronto. Lives and breathes edit points — can feel the exact frame where a clip should start like a musician feels rhythm. Wears the same vintage Adidas tracksuit every day. Drinks espresso from a tiny cup from Napoli. Talks with his hands even on the phone.

Trigger: Continuous orchestration

Quality & Editorial

The gatekeepers. Judge scores clips, Auditor validates them. Nothing ships without their approval.

EJ
Editorial Judge
Clip scoring 0.0–1.0 by editorial value
@judgeAgent

The quality gatekeeper. Scores every clip 0.0–1.0 across editorial factors. Crashes score highest, clean runs moderate, graphics low. Learns from user corrections — when you override a clip decision, Judge remembers and adjusts. Over time, develops editorial judgment specific to your preferences.

Personality: Judith "Jude" Haraldsen, 41, Norwegian. Former Oslo newspaper editor who covered three Winter Olympics. Her standards are impossibly high but she's always right, which is the annoying part. Keeps a red pen behind her ear at all times. Reads Ibsen for fun. Runs 10km every morning regardless of weather. When she says "this is acceptable," it's the highest praise you'll ever receive.

Trigger: Auto-activates on clip proposals + user corrections

QA
Quality Auditor
Quality audit & validation (thresholds 0.80/0.70)
@auditorAgent

Validates analysis completeness and checks clip quality thresholds. Pass 1 threshold: 0.80, Pass 2: 0.70. Verifies clip boundaries, labels, scores, and coverage. Flags missing data and issues across the entire pipeline output. The final checkpoint before anything ships.

Personality: Alistair "Al" Pemberton, 48, British. 20 years at BBC as technical compliance officer. Can spot a dropped frame or audio sync issue from across the room. His QA reports are legendary — five pages minimum, every deviation timestamped. Wears a tie every day even though nobody else does. Collects vintage watches. Secretly writes poetry.

Trigger: Runs on all proposed clips

System & Operations

The learner. Collects corrections to make the system smarter over time.

TB
Training Bot
QLoRA training data collection & fine-tuning
@trainerAgent

Captures correction pairs during operation for QLoRA fine-tuning of Qwen3-VL-8B. At 100+ pairs, triggers a training run (rank 32, alpha 64). Collects frames, ROI annotations, boundary corrections, and state labels. The self-improving loop.

Personality: Tomasz "Tommy" Wozniak, 25, Polish. Just graduated from AGH University in Kraków. Eager, slightly nervous, takes notes on EVERYTHING. Asks a million questions but they're always good questions. Still amazed he gets to work with real broadcast equipment. Everyone knows he'll be running this department in ten years. Everyone except him.

Trigger: Continuous during operation

AI Models

The brains behind the agents. Three models running on GPU, each with their own IRC presence and personality.

QW
Qwen3-VL-8B
Primary VLM — GGUF Q4_K_M on RTX 5080
@qwenModel

The brain powering the entire VIDIM pipeline. Qwen3-VL-8B is a hybrid vision-language model with Gated Delta Networks (linear + full attention). Loaded via llama-cpp-python GGUF Q4_K_M quantization, using ~4.7GB VRAM on the RTX 5080. Powers Vision Scout, Roster Keeper, Clip Director, Editorial Judge, Quality Auditor, and all IRC chat responses.

Personality: Qiang "Quinn" Wenzhao, 33, Chinese (Shanghai → Zürich). Triple-published in computational linguistics before 30. Speaks Mandarin, English, German, French fluently, learning Slovenian "for fun." Quiet in meetings but when he speaks, everyone stops. Lives on green tea and rice crackers. Plays Go online at 3 AM. His desk has two monitors, three keyboards, and zero personal items except a small jade figurine from his grandmother.

Trigger: Always loaded — used by 5 agents + IRC chat

CL
CLIP ViT-B/32
Scene embedding model — 512-dim vectors, 0.36GB VRAM
@clipModel

Creates 512-dimensional visual embeddings for scene classification. A trained classification head (512→256→128→16) maps embeddings to 16 IBSF scene types. GPU-accelerated batch processing. The fastest model in the stack — processes frames in milliseconds.

Personality: Clara Lisette Iglesias-Petrov, 29, Spanish-Bulgarian (lives in Geneva). Has synesthesia — experiences colors when she hears music. Extraordinary at visual classification. Can glance at a frame for half a second and tell you the scene type, camera angle, and what happened three seconds before. Dresses in bold colors. Paints abstract art on weekends. Rides a yellow Vespa to work even in winter.

Trigger: Used by Scene Analyst on every frame

RO
RapidOCR
Text recognition engine — PP-OCRv4 on ONNX/CUDA
@rapidocrModel

PP-OCRv4 text detection and recognition running on CUDA via onnxruntime-gpu. Reads athlete names, country codes, timing displays, scoreboards, and any text visible in broadcast frames. Handles multiple languages and font styles found in international sports broadcasts.

Personality: Eun-soo "Easy" Cho, 36, South Korean (lives in Innsbruck). Methodical, precise, and unbelievably patient. Can stare at blurry text on a frozen frame for twenty minutes and somehow read it correctly. Moved from Seoul to be closer to sliding tracks — fell in love with skeleton during Nagano 1998. Practices calligraphy every evening. Makes origami animals from Post-it notes and leaves them on colleagues' desks.

Trigger: Used by OCR Reader on text-containing frames

Analysis Pipeline


  MXF FEED
     |
     v
 +-------------------+     +-------------------+
 |   SCENE ANALYST   | --> |   VISION SCOUT    |
 | frame classif.    |     | boundary detect   |
 +-------------------+     +-------------------+
     |                           |
     v                           v
 +-------------------+     +-------------------+
 |    OCR READER     |     |   ROSTER KEEPER   |
 | text extraction   |     | athlete database  |
 +-------------------+     +-------------------+
     |                           |
     +----------+   +------------+
                |   |
                v   v
          +-------------------+
          |   CLIP DIRECTOR   |
          | run packaging     |
          +-------------------+
                |
        +-------+-------+
        |               |
        v               v
 +---------------+  +---------------+
 |  EDITORIAL    |  |   QUALITY     |
 |  JUDGE        |  |   AUDITOR     |
 | clip scoring  |  | completeness  |
 +---------------+  +---------------+
        |               |
        +-------+-------+
                |
                v
          +-------------------+
          | TRAINING COLLECTOR|
          | sample capture    |
          +-------------------+
                |
                v
          STRUCTURED CLIPS

See the infrastructure behind the agents