Video: the honest path from script to shipped clip
Yesterday you briefed images like an art director. Today, the medium with the loudest hype and the most quietly useful reality: video. By the end of this lesson you'll have shipped a real, captioned clip for $0 — and you'll know a true story about a famous tool's death that will keep saving you money for years.
Useful AI video is three jobs — WRITE the script, FRONT it with a presenter, EDIT and caption it — and in mid-2026 all three have a free path. The only job that still mostly costs money is generating raw footage from a text description (the "text-to-footage" you see in demos).
1First, a true story: the tool that was "the future"
In 2025, OpenAI launched an app called Sora. You typed a description and it generated short video clips — and around them, a whole social feed of AI-made footage. Tech headlines called it the future of video. People queued for invite codes. If you read about AI at all that year, you read about Sora.
In 2026, OpenAI discontinued the consumer Sora app and the support around it. The app reportedly cost on the order of a million dollars a day to run, its audience shrank after the novelty wore off, and the company moved its computing power elsewhere. The detail to remember is not the date; it is the pattern: even a famous AI tool from a leading company can disappear.
Here's the part that matters for you. Nobody who used Sora did anything wrong — they used a popular tool from a leading company, exactly as anyone sensible would. But everyone whose skill was "I know my way around the Sora app" lost that skill overnight. And everyone whose skill was "I can write a tight 60-second script, get it presented on camera, caption it and ship it" lost nothing — they swapped one tool out of their workflow and kept moving.
We'll call this the Sora parable, and it's the deepest lesson of the day: tools die; workflows and skills survive. It's also why this course has been teaching you capabilities and habits instead of buttons and version numbers since Day 1. Buttons move. The job of "make a clear, watchable video" does not. Keep the parable in your pocket — it comes back on Day 10 when you future-proof your whole toolkit.
2"AI video" is actually three jobs — and all three have a free path
When people say "AI video" they're usually mashing together three very different jobs with three very different price tags. Pull them apart and the scary topic becomes a tidy pipeline:
- WRITE — someone has to decide what the video says. This is where your chat assistant is genuinely excellent, and it's free. It's also the job that decides whether the video is worth watching at all. Day 6's rule applies word for word: Think → Make — the thinking happens in the chat assistant, never in the making tool.
- FRONT — someone has to appear on screen and say it. An avatar tool turns your script into a video of a realistic AI presenter reading it to camera. The generous free taste here is Synthesia: a free tier with a small monthly allowance of avatar video and a watermark — plenty for practice and for personal or internal use. (HeyGen has a free tier too, though a thinner one.) The exact free minutes change often, so check the current limits inside the app before you plan around a number. And the zero-tech alternative is always open: front it yourself, with your phone. That's not cheating; it's often better.
- EDIT — someone has to trim it, caption it and export a file you can actually share. CapCut (the free desktop editor) does this for $0 as of mid-2026: real editing, automatic captions, and no watermark added on a plain export. (CapCut shuffles features between free and Pro now and then — if your screen differs, the workflow still holds.) Captions aren't decoration, by the way — a large share of social video plays on mute, so a video without captions is half a video.
Notice what this pipeline is: the staff model from Day 3, applied to video. Claude (or your daily driver) is the screenwriter. Synthesia is the on-screen talent. CapCut is the editing room. You're not "using an AI video tool" — you're routing three jobs to three specialists, and the only specialist that costs money is the one job we haven't mentioned yet. That's section 4.
3Write for the ear, not the eye
Before the walkthrough, the one craft skill that makes everything downstream better: spoken language is not written language. A paragraph that reads beautifully on a page sounds stiff and breathless when a person — or an avatar — says it out loud. Three rules cover most of it:
- Hook in the first 5 seconds. Viewers decide almost instantly whether to keep watching. Open with a question or a surprising claim — not your name, not "in this video I will…".
- Budget ~150 words per minute. That's roughly the pace of natural speech. A 60-second video is about 150 words — shorter than you think, which is good news: short scripts are easier to write well, and they spend less of your free avatar minutes.
- Short sentences, one idea each. Contractions are welcome. If you wouldn't say a sentence to a friend across a table, it doesn't belong in a script.
Written for the eye: "In this video I will explain three important considerations when configuring your club's sign-up process."
Written for the ear: "Ever wonder why half the club never finishes sign-up? Three fixes. Here's the first."
You don't have to remember these rules in the moment — you put them into the brief. The Briefing Formula from Day 2 (Role · Goal · Context · Format · Tone) works for scripts exactly as it worked for emails and for Day 7's Image Brief, and the walkthrough below hands you a complete one to copy.
4The paid frontier
The fourth job — the one in the jaw-dropping demos — is text-to-footage: you type "a drone shot over a foggy coastline at dawn" and a model generates original cinematic footage from nothing. That job is real, it's improving fast, and in mid-2026 it's the one job that still mostly sits behind a paywall. The mainstream path is Google's Veo, used through a filmmaking tool called Flow. Google does hand out a small free taste — the free tier includes a modest pool of monthly credits in Flow, enough for a handful of short, low-priority clips — but it runs out fast, and the everyday free Gemini chat is not where video lives. The exact credit allowance changes often, so check the current limits inside the app before you assume.
If you ever grow into a regular video creator, the single best paid upgrade is Google AI Pro — a mid-tier paid plan that unlocks real video generation through Flow, plus Gemini inside Gmail and Docs. Prices and tiers change often, so check the current pricing page. Mentioned once, never pushed: nothing in this course needs it, and your first fifty useful videos don't either.
You're not the camera operator today — you're the director of a tiny film crew: a screenwriter who drafts in seconds (the chat assistant), an actor who never fluffs a line and never needs a second take (the avatar), and an editor who captions everything without complaint (CapCut). Directors don't operate every machine on set. They brief each station, judge the output, and ask for another take when it's not right. It's Day 6's architect-and-builder rule, grown into a crew — and you've been practicing the director's only real skill, the brief, since Day 2.
One sitting, three stations. You'll write a 60-second script, turn it into a presenter video, and export a captioned file you can show to a real person today. How long it takes varies — most people get through it in a single sitting. A reminder of the convention we've used all course: square brackets in a prompt mean replace this with your own — like [your topic]. And one honest note before you start: the avatar render takes a few minutes — that's your tea break, not a malfunction.
Station 1 · WRITE — the script Claude or your daily driver
- Open Claude (or your daily driver) and start a new chat — the new-chat button, top of the sidebar. an empty message box, ready for a brief.
- Send the script brief. Pick a topic you know well enough to teach a friend — a work tip, a recipe trick, how your club runs its sign-ups.
Copy-paste prompt · script briefWrite a 60-second video script about [one thing you know well enough to teach a friend]. Audience: [who should watch this — e.g. new colleagues / club members / curious beginners]. Structure: a hook question in the first 5 seconds, then three short points, then a one-line takeaway. Language: spoken and conversational — short sentences, contractions, no jargon. Length: maximum 150 words. Format: plain paragraphs only — no stage directions, no camera notes, no emoji.a ~150-word script that opens with a question and ends on a single clean takeaway.
- Now iterate — the Day 2 habit, applied to speech:
Copy-paste prompt · the read-aloud passRead it out loud in your head. Smooth out anything that doesn't sound like a person talking — long sentences, formal words, anything I'd never say to a friend across a table. Then show me the final version.a noticeably more natural script — shorter sentences, simpler words.
- Optional but smart on a free plan: ask for a tighter cut too — Now give me a 30-second version that keeps the same hook and the single strongest point. Shorter videos spend fewer of your free avatar minutes. Copy your final script somewhere handy. two versions of your script, ready to paste.
Two rules that cost nothing to follow and a lot to break:
- Label AI presenters. If a realistic avatar fronts your video and your audience could reasonably believe it's a real employee or a real spokesperson, say it's AI — a line in the caption or description is enough.
- Never fake a real person. Not your boss, not a colleague, not a celebrity — not even as a joke. The tools make it technically easy; that's exactly why your restraint is worth something.
Station 2 · FRONT — the avatar presenter Synthesia
- Go to synthesia.io and create a free account (email or Google sign-in, no card needed). a workspace with a button to create a new video.
- Create a new video and choose an avatar from the gallery. Free plans offer a small cast — pick whoever feels right for your topic; you can change your mind later. your chosen presenter standing on a slide-like canvas.
- Find the script box (the text area attached to your scene) and paste your 60-second script into it. your script in the box, with an estimated duration near it — check it fits your free minutes.
- Pick a clean, simple background from the built-in options. Resist decorating — the words are the show. a preview that looks like a presenter in a tidy setting.
- Generate the video, then wait a few minutes. The free plan gives a small monthly allowance of video, with a watermark. These limits change often — check the current minutes inside the app, and if the numbers look different on your screen, the workflow still works the same. after the wait: your words, spoken to camera by your avatar, watermark in the corner — completely fine for practice, personal and internal use.
- Download the finished file to your computer. (Prefer to skip avatars entirely? Film yourself reading the script on your phone instead — one take is enough. Everything in Station 3 works the same.) a video file in your downloads folder.
Station 3 · EDIT — captions and export CapCut desktop
- Download the free CapCut desktop app from capcut.com, install it, open it, and create a new project. an editor with three areas — your media files, a preview window, and a timeline along the bottom.
- Import your video (the avatar file or your phone clip) and drag it down onto the timeline. your clip as a long strip on the timeline; the preview plays it.
- Find the automatic captions feature — it lives with the text tools, usually labelled "Auto captions". Run it on your clip. caption segments appearing along the timeline, synced to your voice, after a short processing wait.
- Read every caption against what's actually said. Auto-captions mishear names and specialist words — click any caption to correct its text. This is Day 4's Three-Click Check instinct in miniature: nothing ships unchecked. captions that match the audio word for word.
- Trim the dead air: drag the ends of the clip inward so the video starts on your hook and ends on your takeaway. a tighter clip that starts mid-energy, not mid-silence.
- Export (the export button, top-right area). 1080p is plenty. CapCut's free export adds no watermark of its own. a finished .mp4 with the captions baked into the video (so they show on every app, even ones that strip subtitle files) — your video, shipped.
- Watch it once on your phone — the screen your viewers will actually use. something you made that didn't exist this morning.
Before you compare your clip to anything you've seen online, calibrate. The clips that go viral as "look what AI can do" and the clip you make in one short sitting are different products with different purposes:
Beginners judge AI video by viral demo reels — and then fall off one of two cliffs. The first: giving up ("I could never make that"), and so never shipping the genuinely useful clip that was one short sitting away. The second: chasing the demo — burning an evening and a pile of free credits trying to coax cinematic text-to-footage out of tools that gate the good stuff behind a paywall. Both miss the same point: the beginner win was never a Hollywood shot. It's a clear, captioned, human-useful talking video, shipped in one short sitting for $0 — the kind of thing your team, your club or your family will actually watch. Demos are marketing; pipelines are yours. Your edge is knowing the difference on Day 8 instead of learning it the expensive way.
You already built the video in the walkthrough above. This mission is the part that makes it real:
- Show your walkthrough clip to one real person and watch their face at the 5-second mark — that's your hook review.
- If the hook lost them, rewrite just the first line with the read-aloud pass and re-export.
- Save the script brief that worked into your Prompt Notebook (Day 5) — on Day 10 it becomes one of your top five saved prompts.
Today's recap — 30 seconds
- "AI video" is three jobs, not one: WRITE → FRONT → EDIT — and in mid-2026 all three have a free path.
- The $0 pipeline: chat assistant for the script → Synthesia (free tier, small monthly allowance, watermark — check current in-app limits) → CapCut (free editor, auto-captions, no watermark on export). One short sitting end to end.
- Write for the ear: hook in 5 seconds, ~150 words per minute, sentences you'd say to a friend.
- Raw text-to-footage is the paid frontier — Veo via Flow; the free taste is tiny (a modest monthly credit pool — check current in-app limits).
- Two honesty rules, always: label AI presenters; never fake a real person.
- The Sora parable: "the future of video" in 2025, discontinued in 2026 — tools die, workflows survive.