People keep asking how our multi-shot engine actually renders videos. Short answer: it is duct tape, puppeteer, and a lot of careful engineering. Long answer below.

The problem

When we started, no video-generation API offered multi-shot in a single render. Runway had it in their web UI, but not in their API. Every other provider forced us to render shots individually and stitch them — which broke visual consistency and doubled our costs.

The pragmatic choice: browser automation

Instead of waiting for APIs to catch up, we built a browser automation layer that drives Runway’s web interface directly. Each job spins up a headless Chrome instance, logs into a pre-authorized account, fills the storyboard UI, clicks render, and waits.

Client → POST /api/jobs (SQLite record)
       → startJobWorker(jobId)
           → execFile("node lib/runway-browser.mjs")
               → Puppeteer + stealth plugin
                   → fills scenes in Runway Multishot UI
                   → triggers Generate
                   → polls DOM for <video> element
                   → downloads MP4
           → uploads to Cloudflare R2
           → updates job status = completed

The queue

Jobs are stored in a SQLite file with status fields: pending, processing, completed, failed. A background worker polls for pending jobs and runs them through the browser automation. Because SQLite is just a file and the worker is a long-running Node process, we survive restarts — on boot, we requeue anything left in-flight.

This is the reason you can close your browser tab during a render. The job is server-side state; your tab is just a viewer polling a public endpoint for progress.

What we learned the hard way

Headless Chrome is memory hungry. 1GB per instance is normal. Hetzner VPS works; Vercel serverless does not.
Google OAuth hates automation. Stealth plugin + persistent profile kept our session alive. Anti-bot was a year-long arms race.
File cleanup matters. After upload to R2, we delete the local file. Without this, 35GB disk fills in a week.
Status polling needs backoff. First iteration polled every 1s and DDoS'd ourselves. 5s feels right.

What is next

We are migrating to a proper message queue (BullMQ + Redis) to handle concurrency across multiple worker pools. That opens the door to real priority tiers — not just "faster queue" but "dedicated hardware" for Studio customers.

Longer-term, we are training our own multi-shot model. Browser automation was the right choice to launch in 3 weeks. It is not the right choice to scale to a million videos per month. When we flip the switch, the change will be invisible to users — the API stays the same, just faster and cheaper.

“Build the thing that works in days. Then replace the hacks while users use the product.”

Behind the scenes: how our multi-shot engine works

The problem

The pragmatic choice: browser automation

The queue

What we learned the hard way

What is next

Ready to make your own short?

Prompt patterns for consistent multi-shot videos

Vertical vs horizontal: when to pick which

The math of viral shorts (and where AI fits)

The problem

The pragmatic choice: browser automation

The queue

What we learned the hard way

What is next

Ready to make your own short?

Keep reading

Prompt patterns for consistent multi-shot videos

Vertical vs horizontal: when to pick which

The math of viral shorts (and where AI fits)