Content Factory — Bojan Savić

A TypeScript + React (Remotion) video factory that compiles idea → SRT → storyboard → animated scenes → stitched MP4 / PDF carousel as a single command, and exposes the whole pipeline through a local web dashboard so I can ship LinkedIn posts and course lessons in minutes instead of hours.

In daily use. Drives weekly LinkedIn output and a course-in-progress. 50+ shipped pipeline phases to date, each scoped and reversible.

Why it exists

Short-form video is now the highest-leverage distribution channel for technical work — but the production loop (record → edit → caption → mix → cut variants) is the actual cost. Every minute of finished video on a normal NLE timeline costs 30–60 minutes of clicking. That cost compounds: one idea, four output formats, four edits.

The bet: video should be a build artifact. If the script and the visual structure are data, four format variants is a --mode flag, not four edits. And if the system is a programmable pipeline, an LLM can drive it.

What it does

A single recording on my phone can become:

a noir-styled vertical Short with face-cam picture-in-picture, synced voiceover, ducked music bed, per-scene captions
a full-canvas standalone vertical for LinkedIn (no face overlay)
a 4:5 PDF carousel with one page per visual beat
a horizontal lesson with TTS narration and YouTube chapters

— from the same runtime configuration, by changing one flag.

The pipeline is also driven from a local browser dashboard: pick a mode, upload an SRT + audio, watch the storyboard plan render scene-by-scene, edit any scene’s text inline, re-emit — all without touching the terminal.

The interesting engineering

Visuals as data, not timelines

Every scene is a typed React component (TitleCard, BulletList, PullQuote, StatCallout, BigStatement, CTASlide, plus a parallel Noir theme — corkboard backdrop, manila paper, red stamps, typewriter fonts). A “video” is just a RuntimeLessonAnimation[] — a JSON-shaped description of what plays when. Remotion handles rendering; the source of truth is structured data, so the same lesson reflows correctly across 1920×1080, 1080×1920 split-facecam, 1080×1920 standalone, and 1080×1350 PDF — driven by a --mode flag, not a re-edit.

SRT-first authoring

The hardest part of this kind of system is going from “I just talked into my phone” to “here’s a structured visual plan.” Two paths solve it:

A heuristic agent (storyboard-from-srt.ts) detects silence gaps, classifies each segment against six visual templates with a confidence-ranked rule set, and emits a runtime config + per-scene WAV slices + caption metadata. Free, fast, deterministic.
A /storyboard Claude Code skill does the same job but with reasoning: it cleans SRT mis-hears against an optional post.md ground truth, re-segments on idea boundaries instead of silence, synthesises Mayer-compliant on-screen text — distilled, not mirrored from the narration. Enforces a CTA as the final scene by construction.

Both paths converge on the same downstream artifacts so the rest of the pipeline is identical.

Voiceover-authoritative sync

A long-standing pain in video tooling: “the visuals drift from the voice.” Solved here by making the master WAV authoritative — the beat-emitter auto-pads scene durations to close inter-scene silence gaps, freeze-extends the final frame to match voiceover length, and slices captions per scene from the SRT by time-overlap (not by trusting upstream caption blocks).

Face-cam composition at the stitch layer

Drop a video.mp4 in the input folder and the system: extracts its audio (overwriting any existing voiceover — video wins), uses it as the master track, and overlays the face-cam as a 480×270 picture-in-picture at one of five anchor positions, with an alpha fade-out on the tail if the face-cam is shorter than the final video. No re-render of the underlying visuals — pure FFmpeg overlay at stitch time.

A real local dashboard, not a CLI wrapper

dashboard/server.ts (~700 lines, Express, loopback-only with a per-boot token) drives the whole thing through a browser:

mode-aware controls that grey out incompatible options
lesson browser grouped by content type via an automatic classifier
voiceover upload
a storyboard plan viewer with one card per scene
inline scene editing with a per-component whitelist of text fields, atomic temp-file write, auto re-emit
two-turn approval — the /storyboard skill needs a human “go” before committing, so the dashboard mints a UUID, passes --session-id on turn 1 and --resume on every follow-up turn, preserving Claude’s in-memory context across the approval pause
live SSE terminal of stdout / stderr from every job, with a Kill button

A self-improving build system

Every pipeline run writes phase sidecars (_phase-build.json, _phase-render.json, _phase-stitch.json…). A reflection script aggregates them into SELF-IMPROVEMENT.md — a cumulative log of error counters, per-scene sync stats, bundle and render times. The next run reads its own history, so regressions surface as data, not vibes.

Tech

TypeScript end-to-end (renderer, scripts, dashboard, server)
React + Remotion for compositions; Zod schemas wired to live props editors in Remotion Studio
Node + Express for the local dashboard, SSE for live job streaming
Python + Whisper for transcription with a domain-biased initial_prompt to fix proper-noun mis-hears
FFmpeg for stitch, ducking, padding, pip overlay
PDFKit for carousel export

What it demonstrates

Systems thinking on a creative problem. The interesting work isn’t the React components — it’s deciding that “a video” is a serialisable data structure, then building everything else around that decision.
Pipeline design with a human in the loop. Approval gates exist where they’re cheap (storyboard review, before TTS), not where they’re expensive (mid-render).
Build-vs-buy judgment. Remotion for deterministic rendering, FFmpeg for final composition, custom tooling only for the parts no off-the-shelf tool covers (SRT-first authoring, voiceover-authoritative sync, local web control plane).
LLM-augmented authoring without LLM lock-in. The /storyboard skill is the primary path; the heuristic CLI does the same job for free in CI. Both produce the same artifacts, so the rest of the system doesn’t care which created them.

Status

In daily use. Currently driving LinkedIn output and a course-in-progress. Roadmap (IMPROVEMENT-PLAN.md) is versioned phase-by-phase — 50+ shipped phases, each scoped, justified, reversible.