People keep asking how our multi-shot engine actually renders videos. Short answer: it is duct tape, puppeteer, and a lot of careful engineering. Long answer below.
The problem
When we started, no video-generation API offered multi-shot in a single render. Runway had it in their web UI, but not in their API. Every other provider forced us to render shots individually and stitch them — which broke visual consistency and doubled our costs.
The pragmatic choice: browser automation
Instead of waiting for APIs to catch up, we built a browser automation layer that drives Runway’s web interface directly. Each job spins up a headless Chrome instance, logs into a pre-authorized account, fills the storyboard UI, clicks render, and waits.
Client → POST /api/jobs (SQLite record)
→ startJobWorker(jobId)
→ execFile("node lib/runway-browser.mjs")
→ Puppeteer + stealth plugin
→ fills scenes in Runway Multishot UI
→ triggers Generate
→ polls DOM for <video> element
→ downloads MP4
→ uploads to Cloudflare R2
→ updates job status = completedThe queue
Jobs are stored in a SQLite file with status fields: pending, processing, completed, failed. A background worker polls for pending jobs and runs them through the browser automation. Because SQLite is just a file and the worker is a long-running Node process, we survive restarts — on boot, we requeue anything left in-flight.
This is the reason you can close your browser tab during a render. The job is server-side state; your tab is just a viewer polling a public endpoint for progress.
What we learned the hard way
- Headless Chrome is memory hungry. 1GB per instance is normal. Hetzner VPS works; Vercel serverless does not.
- Google OAuth hates automation. Stealth plugin + persistent profile kept our session alive. Anti-bot was a year-long arms race.
- File cleanup matters. After upload to R2, we delete the local file. Without this, 35GB disk fills in a week.
- Status polling needs backoff. First iteration polled every 1s and DDoS'd ourselves. 5s feels right.
What is next
We are migrating to a proper message queue (BullMQ + Redis) to handle concurrency across multiple worker pools. That opens the door to real priority tiers — not just "faster queue" but "dedicated hardware" for Studio customers.
Longer-term, we are training our own multi-shot model. Browser automation was the right choice to launch in 3 weeks. It is not the right choice to scale to a million videos per month. When we flip the switch, the change will be invisible to users — the API stays the same, just faster and cheaper.
“Build the thing that works in days. Then replace the hacks while users use the product.”