AI image generation for ads
The model is not the skill. The brief is. Today you learn to direct an image generator the way a creative director runs a shoot — so the output is on-brand, ad-ready, and built to feed the loop, not AI slop.
An AI image generator is a photographer who will shoot anything you describe and nothing you don't — so the leverage is entirely in the brief: brand style references, product fidelity, and legible text are inputs you supply, never things you hope the model invents correctly.
1The model is a commodity; the brief is the craft
Yesterday (Day 12) you turned on Meta's free, in-platform enhancements — background generation, image expansion, touch-ups — that reformat and remix the assets you upload. Today you make the assets themselves from a blank canvas. That is a different job and a different risk profile, so it gets its own day.
Here is the trap to disarm first. Founders read about "the best AI image model" and treat tool selection as the decision. It barely matters. The research stack as of mid-2026 has a clear division of labour, and you'll use several in one pipeline: Midjourney for the highest-aesthetic hero shot, FLUX.2 for photoreal product and lifestyle, Ideogram V3 or Recraft V4 when there's heavy text or a logo, ByteDance Seedream when you need cheap bulk variants, and Google's Nano Banana Pro as the 2026 default for edits, variants, and locking brand consistency across a series. Pick by the job in front of you, not by the leaderboard.
The reason this is liberating: it means the durable skill is not "which model." It's how you brief. Two operators with identical access to the same model produce wildly different output — one ships generic stock-looking sludge, the other ships an ad that converts — because one wrote a brief and one wrote a wish. Every generic image you've ever scrolled past was a one-line prompt. Every on-brand one was a structured brief.
And remember why we're here. Back on Day 1 we established that creative is the last lever you own and that it fatigues — every winner decays as it scales. The point of generating images with AI is not to make one pretty picture; it's to feed the volume the loop needs (Day 5's explore/exploit, Day 10's matrix) without a studio budget per asset. Cheap volume only compounds if it's on-brand and tagged, so the brief is where quality and learnability both get protected.
2The six inputs that make an image on-brand
A wish is "a serum bottle on a marble counter, sunlight." A brief specifies six inputs the model cannot guess. Miss any one and the model fills the gap with its training-data average — which is exactly what "AI slop" is: the statistical mean of every image like yours that ever existed. Bland by construction.
The six inputs:
- Subject & product fidelity — what exactly is in frame, and is it the real product. This is the non-negotiable one (Section 3).
- Style reference — the look: a brand palette, a mood, 2–4 reference images of past on-brand creative. Midjourney's --sref and Style Weight, or feeding Nano Banana Pro your logo, brand colours and prior visuals, exist precisely to pin this down.
- Composition & format — framing, where the product sits, negative space for copy, and the aspect ratio you'll ship into (9:16 Reels, 4:5 Feed, 1:1 — your Day 10 formats).
- Lighting & treatment — and here you make a deliberate Day 9 choice: hi-fi (studio, polished, controlled light) or lo-fi (phone-shot, on-a-kitchen-counter, "a friend posted this"). AI does both; the brief decides which.
- Text-in-image — if there's a headline or badge baked into the pixels, you specify the exact words. In 2026 Ideogram V3, Recraft V4 and Nano Banana Pro render legible, correct text; older models still produce garbled glyphs. Route text-heavy work to a model that can spell.
- Persona & angle cues — the human the image implies (Day 7) and the message it carries (Day 8). A "budget-conscious parent, before/after relief" frame looks nothing like a "status-seeking professional, aspirational" frame, even for the same product.
A worked example. Say a skincare brand wants Feed creative for one concept across two personas. The wish — "serum on marble, sunlight" — gives you a forgettable stock photo on the first try and three near-identical re-rolls after. The brief gives you a generation matrix. Hold concept, product and palette fixed; vary persona, treatment and format: 2 personas × 2 treatments (hi-fi / lo-fi) × 2 ratios (4:5 / 9:16) = 8 tagged frames from one structured brief in an afternoon. At a studio shoot that's a half-day and four figures; here it's a few euros of credits. That is the leverage — but only because every frame is a deliberate, labelled point in the matrix, not eight rolls of the same dice.
3Product fidelity: composite, don't hallucinate
Here is the single most expensive mistake in AI ad imagery, and the one rule that separates amateur output from professional: never let the model invent your product.
If you describe your product in words and let the generator draw it, you will get a bottle, a sneaker, a dashboard — plausible, beautiful, and wrong. The label is gibberish, the logo is a melted approximation, the cap is the wrong shape, the device has six buttons instead of four. To you it's obviously not your product. To a customer who clicks through and sees the real thing, it's a bait-and-switch — and trust, the thing your whole funnel runs on, takes the hit. For regulated or premium categories it's worse: a hallucinated product can imply a feature you don't sell.
The fix is compositing. Generate the scene with AI — the marble, the light, the lifestyle context, the model's hands — and place your actual product photograph into it. Modern editors make this a single step: you inpaint the real product onto an AI-generated background, or mask it in and let Nano Banana Pro colour-match the lighting so it sits naturally. The AI does what it's genuinely good at (cheap, infinite, plausible environments); the camera does what it must (an honest record of the thing you'll ship). Same logic protects faces and any text the customer must trust — generate the world, anchor the truth.
This same discipline scales hi-fi and lo-fi (Day 9) equally. A hi-fi composite drops the real bottle into a studio-lit AI set. A lo-fi composite drops the same real bottle into an AI-generated "messy bathroom shelf, phone flash, slightly off-centre" — and now you have a native-looking UGC still that didn't need a creator, a kitchen, or a half-day. Both are on-brand because in both the product is real.
The generator is a world-class photographer who has shot everything and will shoot anything — but is blind to your brand and has never seen your product. Hand them a one-line wish ("nice serum photo") and they shoot the generic stock image they've shot a thousand times. Hand them a brief — mood boards, the palette, the framing, the lighting, and the actual product on the table — and they shoot your ad. You're not pressing a button. You're on set, directing. The prompt is just how you talk to the crew.
This isn't a one-off prompt you write and lose. It's a saved template in your creative doc — a row per generation, six input columns, so anyone (or any future you) can produce on-brand frames without re-deriving the recipe. It plugs straight into the genome tags from Day 4: the brief's persona/angle/treatment/format fields are the genome axes, captured at the moment of creation rather than guessed at later.
Note the QA gate. Anything the model synthesised — a generated background, a generated scene — falls under Meta's March-2026 rule requiring you to disclose AI-generated or AI-modified creative (a checkbox in Ads Manager), with the EU AI Act often demanding stricter labelling still. A composited real product on an AI background counts. Build the disclosure check into the gate now; we'll formalise this gate as the QA stage of the production line tomorrow.
They type a one-line wish, accept the first photoreal-looking result, and ship it — proud they "used AI." The output is generic AI slop (the training-data average, recognisable as an ad from space) or, worse, a hallucinated product that doesn't match what arrives in the box. Both quietly torch trust and CTR. The reframe, and your edge: AI image generation is not a vending machine, it's a direction skill. The operator who writes a six-input brief and composites the real product ships creative indistinguishable from a studio shoot at a fraction of the cost — and, because every frame is briefed against the genome, ships volume the loop can actually learn from. Anyone can press generate. Almost no one briefs.
Today's recap — 30 seconds
- The model is a commodity (Midjourney / FLUX.2 / Ideogram / Nano Banana Pro — pick by job); the brief is the craft.
- Six inputs make an image on-brand: subject+fidelity, style ref, composition+format, lighting+treatment, text-in-image, persona+angle — miss one and the model fills it with the bland average.
- Composite, don't hallucinate: generate the scene, drop in the real product photo. Never let the model invent what the customer will receive.
- AI does both hi-fi and lo-fi (Day 9) — the brief decides which; both stay on-brand because the product is real.
- The brief's fields are the genome tags (Day 4); disclose AI-modified creative at the QA gate (Meta, March 2026).