HairTX · Lam Institute

AI Video Pipeline — Phase 1 Results & Phase 2 Decision Brief

Prepared 2026-05-18
Internal review · v1.0

Over the past 24 hours we built a working prototype that turns one of Dr. Lam's existing photos and a sample of his voice into a fully AI-generated short-form FAQ video. Below are the two finished V1 videos for review, followed by a side-by-side comparison of the tools we'd use to scale this to production, the costs involved, the team skills required, and the decisions we need from leadership to move forward.

Recommended path in one paragraph. Pilot Hedra Character-3 + ElevenLabs Instant Voice Clone (Bundle 1) for ~$85–$115 / month nominal subscription cost. Two-to-three weeks from greenlight to first production batch, gated mostly on consent paperwork rather than engineering. Loaded cost including legal, governance, and operational overhead is higher — see the cost section below. Decisions needed: pilot budget approval, consent release for Dr. Lam, and a 90-day success-metrics definition.
Phase 1 · Working Prototype

The two finished videos

Both videos use the same script (four common patient FAQs about hair transplant), the same source photo of Dr. Lam, and the same 9:16 vertical format for Reels / Shorts / TikTok. The only thing that changes between them is the voice source. Watch both end-to-end and tell us which one sits closer to your publishable bar.

Variant A Native AI voice 70s
Video
Google Veo 3.1 fast
Voice
Synthesized by Veo (generic AI voice)
Consent
None required — no real voice cloned
Length
~70 seconds (Veo's pace, naturalistic pauses)
Tight on-screen lip-sync. Zero voice-cloning risk. Fastest path to publishable.
Voice doesn't sound like Dr. Lam — patients may notice. Pace is slower than the script intended.
Variant B Cloned voice 47s
Video
Google Veo 3.1 fast (same render as Variant A)
Voice
Dr. Lam's voice, cloned via F5-TTS from a 12s YouTube sample
Consent
Required before any external use
Length
~47 seconds (closer to the scripted target)
Sounds like Dr. Lam. Pace matches the script. Materially closer to the published bar.
Lips don't sync perfectly to the cloned audio (Veo's mouth was synced to its own voice). Legal consent paperwork must land before publication.

At a glance

Dimension
Variant A · Native
Variant B · Cloned
Voice identity
Generic AI
Dr. Lam
Lip-sync accuracy
High (mouth ↔ AI voice)
Approximate (audio overlaid)
Length vs script intent
+40% (slower pacing)
On target
Consent required to publish
No
Yes
Recommended use
Stand-in if consent is delayed
Production target once paperwork lands
Variant B is internal-review-only as built. Two compounding constraints:
  1. License. F5-TTS (the local voice-cloning model used in V1) ships its pre-trained weights under CC-BY-NC-4.0 — non-commercial use only. Variant B as built therefore cannot be published externally regardless of consent. The voice layer must be re-rendered through a commercial tool (recommended: ElevenLabs Instant Voice Clone) before any paid or public use.
  2. Consent. Every commercial AI voice tool — and the legal standard for AI-cloned voice in paid promotion — requires explicit, written consent from the voice owner plus a clean reference recording. The 12-second YouTube extract was acceptable for prototyping; production requires Dr. Lam to sign a scoped release and record a 2–3 min reference clip in a quiet room.
Phase 2 · Headline Numbers

What changes when we go to production

V1 was the proof of concept. To run this weekly at ~100 videos per month, four numbers shift dramatically.

5–15×
Cheaper at scale
Hedra vs current Veo baseline at 100 videos/mo
1 call
Per video
Down from 11 Veo calls + stitching in V1
~$115/mo
Bundle 1 nominal
Hedra Pro ($60) + ElevenLabs Creator ($22) + overage. Loaded cost higher.
~2 wks
To V2 production
Gated on consent paperwork, not engineering

Monthly cost at 100 videos / month

Nominal subscription cost across the three production bundles. ~50s of finished output per video.

Hedra + ElevenLabs Bundle 1 · recommended
~$115~$1,380 / yr
Hedra + ElevenLabs + Veo fallback Bundle 3 · pilot then collapse
~$165~$1,980 / yr
HiggsField + ElevenLabs Bundle 2 · creative variety
~$620~$7,460 / yr
For reference — the V1 prototype baseline (Veo + F5-TTS) would cost ~$750–$2,000/mo at the same volume, 5–15× the recommended bundle. Not shown on the chart above to keep bundle differences readable.

Nominal cost only — does not include legal review, governance, content moderation, or operational overhead. See "fully loaded cost" further below.

The top three tools, head to head

Each tool scored on the five dimensions that drive our outcome. Five dots = leader on that dimension; fewer = trade-off.

HiggsField Soul ID + Lipsync

Creative-control video gen

Train a persistent doctor "character" once; reuse across many future videos with cinematic motion controls.

Identity preservation
Lip-sync to cloned voice
Setup simplicity
Production cost
API stability
Per video~$5–9
Setup time1–2 days

Google Veo 3.1 fast

V1 baseline · keep as fallback

General image-to-video. Built V1 cleanly but identity drift and cost rule it out as the primary V2 tool.

Identity preservation
Lip-sync to cloned voice
Setup simplicity
Production cost
API stability
Per video$15–$35
Setup timeAlready built

The pipeline gets dramatically simpler

V1 required 6 stages because Veo and the voice clone are separate systems we had to glue together. The recommended V2 path collapses to 3 stages — Hedra takes a photo and an audio file and returns a finished video.

V1 (built)
1
Segment
Python
2
Voice sample
ffmpeg
3
11 Veo calls
Google Veo 3.1
4
Voice clone
F5-TTS
5
Mux + stitch
ffmpeg
V2 (recommended)
1
Voice clone
ElevenLabs
2
Render video
Hedra Character-3
3
Publish
Direct to social

Operator overhead drops from ~10 minutes per video (monitor 11 Veo polls, retry failures, trim, stitch) to ~2 minutes (one API call, one preview, one upload).

Phase 2 · Tool Comparison

What we'd use to scale this to production

The prototype proves the concept. Scaling to ~100 videos per month requires picking production-grade tools at each of three layers. We evaluated 12 video-generation services, 11 talking-head specialists, and 13 voice-cloning tools. The clear winners for our exact use case (real doctor, single photo, vertical short-form, ~60 seconds):

LayerRecommended toolWhyCost
Talking-head specialist Hedra Character-3 Photo + audio file → 60s vertical talking-head in a single API call. Best lip-sync; designed for exactly this use case. $0.94–$2.25 per finished minute
Voice clone ElevenLabs IVC Industry leader for fidelity from a 90s–3min sample. Built for commercial use, consent-first. $22 / month at our volume
Video gen (creative variety) HiggsField Soul ID + Lipsync Trains a persistent "character" of Dr. Lam once, reuses across all future videos. Cinematic motion controls. ~$5–$9 per video
Video gen (value) Kling 2.x Pro (via fal.ai) Strong general video model, native audio support, very cheap. Best alternative if Hedra falls short on a specific shoot. ~$0.17 / sec ≈ $8 / video
Video gen (value) Seedance 2.0 (ByteDance, via fal.ai) Excellent prompt adherence and motion quality; popular for short-form. Sits alongside Kling as the cheap-and-cheerful tier. ~$0.18–$0.30 / sec
Video gen (current baseline) Google Veo 3.1 fast What we used for V1. Works for one-off clips; problems begin at production scale. See below. ~$15–$35 / video at our scene structure
Why isn't Veo the V2 recommendation? Veo built V1 cleanly and is still the right tool for one-off intros, b-roll, or motion graphics where we control the script. But for our specific job — a real doctor talking to camera for a series of videos — Veo has five specific gaps that the alternatives solve:
  1. No persistent identity. Veo re-rolls the doctor's face every call; Hedra and HiggsField's Soul ID lock it.
  2. 8-second clip cap. Anything over 8s needs stitching, which compounds the drift above. Hedra renders 60s in one call.
  3. Lip-sync mismatch with cloned audio. You can hear it in Variant B above — Veo's mouth is synced to Veo's voice, not Dr. Lam's cloned voice. Hedra syncs to whatever audio file we pass in.
  4. Cost at production scale. Honest math: 100 videos × 50s × $0.15–$0.40 per sec ≈ $750–$2,000/mo nominal for Veo. Hedra at the same volume is ~$115/mo nominal — 5–15× cheaper before loaded costs.
  5. Preview model status. Veo 3.1's model IDs end in "-preview"; Google can change pricing or rotate access on short notice. Hedra's API has been publicly stable since Feb 2026.

So Veo stays in the kit as a fallback for non-likeness shots, but Hedra is the right primary tool for the talking-head workload that drives most of the volume.

See real examples from each vendor

Before approving any tool, watch a few real outputs from its own channel. The links below open each vendor's homepage and their public demo reels.

VendorWhat it doesHomepageDemos
Hedra Photo + audio → talking-head video (our top pick) hedra.com Showcase · YouTube
ElevenLabs Voice cloning + TTS (our top pick for the voice layer) elevenlabs.io Voice samples · YouTube channel
HiggsField Soul ID character training + Lipsync module + DoP cinematic motion higgsfield.ai Showcase · YouTube
Kling AI General image-to-video + native audio. Strong character consistency. kling.ai YouTube demos · fal.ai page
Seedance / Seedream ByteDance's video model. Often paired with Seedream image gen. seed.bytedance.com fal.ai page · YouTube demos
Google Veo 3.1 Our V1 baseline. General image-to-video with native audio. ai.google.dev DeepMind page · DeepMind YouTube
Runway Gen-4 General video model. Strong creative tooling, no native audio. runwayml.com Research demos · YouTube channel
Luma Ray 3 Image-to-video. Cinematic looks but face drift on portraits. lumalabs.ai YouTube channel
HeyGen Enterprise avatar / talking-head, more expensive than Hedra. heygen.com Templates · YouTube
D-ID Mature talking-head API; ~3× our recommended cost. d-id.com YouTube channel

Some YouTube links use search rather than a fixed channel ID — vendor channel names change; the search query always resolves to current demos.

Three paths forward

Each bundle is end-to-end (video + voice) and ready to ship. Pick one based on priorities.

Bundle 2 — Maximum creative control

HiggsField (Soul ID + Speak + cinematic shots) + ElevenLabs.

Monthly cost ~$620
Annual cost ~$7,460
Setup time 8–12 hrs
Per video ~3 min ops
  • Persistent character profile of Dr. Lam reused across all future videos
  • Cinematic shot styles (DoP module)
  • More visual variety per video

Bundle 3 — Pilot all three, lock the winner

Build a pluggable layer; run the same script through Hedra, HiggsField, and Veo. Decide on data.

Pilot month ~$180
Then settles to ~$115/mo
Setup time ~8 hrs
Decision after ~10 videos
  • Pluggable provider architecture — swap tools without rewriting
  • Side-by-side eval to remove vendor risk
  • Most defensible decision for leadership

Suggested approach: Start in Bundle 3 (pilot) and collapse to Bundle 1 (Hedra + ElevenLabs) once data confirms it. Total elapsed time to production: 2–3 weeks, gated almost entirely on the consent paperwork — not engineering.

What it takes to run this in-house

Below: tools mapped against the skills they require. Green dots = required skill. Our existing in-house stack already covers every recommended tool — no outside hires needed. The only "skill wall" path is full open-source, which trades $0 in licensing for ~20–40 hours of setup and ongoing maintenance.

Tool Python REST APIs Prompt design Video utilities (ffmpeg) ML setup (PyTorch / HF) Cloud admin In-house ready?
Hedra Character-3 Yes
ElevenLabs Yes
HiggsField (Soul ID + Speak) Yes
Veo 3.1 (current) Yes
HeyGen / D-ID (talking-head SaaS) Yes
Open-source models (LivePortrait, MuseTalk, F5-TTS, etc.) Maintainable but slow
Synthesia / Tavus / Argil Disqualified (consent video required)

Bottom line on staffing. Bundle 1 or Bundle 3 = roughly one focused engineering day to set up, ~2 minutes of operator time per finished video. No new hires.

Cost at production volume (100 videos / month)

Assumes ~50 seconds of finished output per video.

Nominal subscription cost

PathMonthly fixedPer videoMonthly totalAnnual
Hedra + ElevenLabs (Bundle 1)$82~$1.10 overage~$115~$1,380
Hedra + ElevenLabs + Veo fallback (Bundle 3 final)~$135~$1.10~$165~$1,980
HiggsField + ElevenLabs (Bundle 2)$22~$6.00~$620~$7,460
Current V1 baseline (Veo + F5-TTS)$0~$8–$20~$750–$2,000~$9k–$24k
Full open-source path$0$0$0$0 (but ~20h/mo of engineering time)

Fully loaded cost (the real number leadership should plan around)

Subscription cost is the smallest line. Honest fully-loaded total includes legal review, governance overhead, social-platform management, analytics tooling, content moderation, brand monitoring, insurance, drift testing, and vendor-continuity reserves.

Cost layerMonthly estimateNotes
Nominal subscriptions (Bundle 1)~$115Hedra Pro $60 + ElevenLabs Creator $22 + overage
Outside legal counsel (ongoing)$500–$1,500Quarterly disclosure review, Texas medical-board posture, FTC monitoring
Dr. Lam's script-review time$400–$800~2 hrs/week at surgeon hourly rate, per-batch script approval
Social platform management$500–$1,500Scheduling, captions per platform, community + comment moderation
Analytics + brand monitoring$200–$500Conversion tracking, sentiment monitoring, link-attribution tooling
Insurance review / uplift$100–$400Confirm malpractice and general liability cover AI marketing
Quarterly drift + regression testing$100–$300Engineering time to verify face/voice fidelity hasn't drifted with vendor updates
Engineering carry (refactor, vendor swaps, incidents)$500–$1,000~5–10 hrs/mo at internal cost
Total fully loaded~$2.4k–$6k / month~$29k–$72k / year
Read this number, not the $115. The $115/mo nominal is the floor — it's what shows up on the vendor invoices. Fully-loaded is the actual cost-of-program. The Hedra-led path is still materially cheaper than the alternatives and still defensible on cost grounds, but the proposal should be evaluated on the loaded number.

Honest comparison vs. the do-nothing baseline. Today HairTX produces talking-head video the traditional way (planned shoots, manual editing). The AI pipeline isn't competing with $0 — it's competing with that. A traditional surgeon-led short-form pipeline runs ~$4k–$10k/mo all-in (videographer, editor, scheduling, multiple cuts per shoot day). The AI pipeline's real edge is throughput-at-fixed-cost, not raw price.

What success looks like in 90 days

The cost argument alone doesn't justify a new program. These are the metrics the team will report on at the 90-day decision point, and the kill criteria if the program fails to land.

MetricDefinitionPilot targetKill threshold
Cost per qualified consultLoaded program cost ÷ consults attributed to the videos≤ existing channel CAC> 2× existing CAC
View → consult conversionClick-throughs from video CTA that book a consult≥ 0.3% per video at 30 days< 0.05% sustained over 6 weeks
Engagement vs. existing postsMedian engagement rate across V2 videos vs. last 90 days of organicWithin 30% of organic median< 50% of organic median (algorithm suppression signal)
Time-to-publish per videoScript draft → published, batch median< 3 business days> 7 business days sustained
Identity / voice fidelity rubric pass-rateInternal QA against approval rubric per batch≥ 95% pass at first review< 80% (drift / regression)
Patient or staff complaints attributable to AI contentComments, DMs, calls flagging the AI nature0 critical incidentsAny incident requiring video takedown
Compliance auditFTC / Texas medical-board / platform disclosure rules met on every video100% passAny video out of compliance

Decision at 90 days: expand, hold, or kill — based on which targets cleared, not on subjective preference. Pilot data goes to leadership in a structured memo.

Governance pre-conditions — must be in place before launch

The four-team review identified governance gaps as the biggest risk to this program, ahead of any engineering risk. These are not optional polish items.

Pre-conditionOwnerStatus
Scoped voice + likeness release signed by Dr. LamPractice / NKPNot started
Outside counsel review of release + disclosure plan (TX-licensed, healthcare-marketing experience)PracticeNot started
Script-approval workflow with Dr. Lam in the loop (per-batch sign-off, 24-hr turnaround SLA)NKP + PracticeDrafted, not implemented
Kill-switch SOP — written, tested procedure to pull any video from all platforms within 4 hoursNKPNot started
On-frame "AI-generated" disclosure label spec (font, position, contrast, persistence)NKPNot started
Per-batch audit ledger (prompt, model, version, output URL, approval timestamps)NKPSpec drafted
Insurance confirmation that practice malpractice + general liability cover AI marketing contentPracticeNot started
Comment moderation policy and on-call rotationPracticeNot started
Quarterly vendor drift QA (face fidelity, voice fidelity, prompt regression suite)NKPNot started
Documented vendor failover plan — Hedra down or price-spiked, switch to HiggsField in < 24hNKPPluggable layer in design

AI disclosure — six-layer stack

Caption-only "AI-assisted" disclosure is legally insufficient under FTC May 2026 endorsement guidance and Texas SB 1188 (synthetic-media disclosure for licensed-physician statements). Every video that publishes externally must pass all six layers below.

#LayerSpecifics
1On-frame burn-in label"AI-generated representation. Approved by Dr. Sam Lam." Persistent, high-contrast, lower-third position
2In-script acknowledgementFirst 3 seconds of every video: Dr. Lam's cloned voice acknowledges the AI assistance
3Platform-native togglesTikTok "AI-generated content" toggle on; Meta AI-info label on; YouTube synthetic-media disclosure on
4Caption boilerplate"AI-assisted video. Script reviewed and approved by Dr. Sam Lam. For a consultation, link in bio."
5Prompt-level guard-railsScript template forbids outcome claims, synthetic testimonials, before/after stills not separately verified by Dr. Lam
6Audit ledger entryPer-video record of prompt, model + version, render timestamp, approval signature, disclosure status — retained 7 years

This stack is designed to satisfy current FTC endorsement guidance, Texas SB 1188, and Meta / TikTok / YouTube platform-specific AI policies as of May 2026. Counsel to review and re-confirm quarterly.

Open decision

How much AI content — and what mix?

The social-media review identified that 2026 platform algorithms (TikTok in particular) actively suppress AI-avatar + AI-voiceover content of the V1 shape, with documented effects on reach and shares. The right volume target depends on whether we want to optimize for cost-per-video or for actual distribution.

Option X — Hold 100/mo, all AI

Original target. Maximum throughput, lowest cost-per-video.

  • Cheapest per-video
  • Highest algorithmic suppression risk
  • Volume could mask weak engagement

Option Z — Decide post-pilot

Run the 30-day bounded pilot (~10 videos, single platform). Use the actual engagement data to set the V2 volume target.

  • Lowest commitment, most evidence-based
  • Delays the volume decision by ~6 weeks
  • Aligned with the CEO review's "bounded pilot" framing

Decision needed at the leadership meeting: Option X, Y, or Z. The engineering pipeline supports any of the three with no rework.

Next steps — what we need from leadership

Decisions needed (one short meeting)

1 · Approve a pilot budget

Roughly $130 nominal for the first month across the three pilot accounts (Hedra Pro $60, ElevenLabs Creator $22, HiggsField trial ~$50). Loaded cost during pilot ~$3–5k including legal review. Cancel any vendor that loses the bake-off.

2 · Approve the six-layer disclosure stack

See the AI Disclosure section above. Required for legal posture under FTC May 2026 and Texas SB 1188. Caption-only disclosure is insufficient.

3 · Initiate consent paperwork + voice sample

Outside counsel drafts and reviews the release. Dr. Lam signs and records 2–3 min of clean voice in a quiet room. Legal track runs in parallel to engineering.

4 · Pick a volume + content mix

Option X (100/mo all AI), Option Y (40/mo with 70% real footage), or Option Z (decide post-pilot). See "Open decision" section above. The engineering pipeline supports any of the three.

5 · Confirm 90-day success metrics

Cost-per-consult target, conversion target, kill criteria. See "What success looks like" section. Numbers in the table are defaults — leadership should set the actual targets.

6 · Designate program owner + governance roles

Script-approval signoff (Dr. Lam), comment moderation, kill-switch authority, vendor relationship owner. See "Governance pre-conditions" section.

Engineering plan once decisions land

Week 1

Provision & refactor

Pluggable provider layer in the existing codebase. No vendor commitments yet beyond month-1 trials.

  • Create Hedra / ElevenLabs / HiggsField accounts and API keys
  • Build the abstraction so we can swap tools without rewriting
  • Wire all three providers as adapters
Effort: ~1 engineering day
Week 2

Pilot bake-off

Same FAQ script through all three tools. Side-by-side videos delivered to leadership for the data-driven call.

  • Hedra render → 1 video
  • HiggsField render → 1 video
  • Veo render → 1 video (control)
  • Internal review meeting, pick the winner
Effort: ~½ engineering day + render time
Week 3

Lock winner, ship V2

Production runbook: one script per procedure, batch render, hand-off to social scheduler. Weekly cadence becomes routine.

  • Document the runbook in docs/SPRINT_LOG.md
  • Cancel the losing trial accounts
  • First production batch of 5–10 videos
Effort: ~1 engineering day, then ~30 min / weekly batch

Topics already locked in

Risks and how we're managing them

RiskLikelihoodSeverityMitigation
Hedra quality doesn't hold for Dr. Lam specificallyLowMediumPilot 1 video before committing to a month; cancel within trial window if poor.
Voice clone audio fidelity below expectationsMediumLowElevenLabs supports re-clone in same voice slot; re-record sample if needed.
Vendor concentration (Bundle 1 = 2 vendors)MediumMediumPluggable provider layer; HiggsField + Kling pre-wired as failover paths; documented vendor failover SOP.
Vendor acquired, shut down, or 5× price spikeLow–MediumHighSame. Bundle 3 framing keeps two providers warm at all times.
Model drift (face / voice fidelity shifts as vendor updates models)MediumMediumQuarterly drift-QA against a frozen reference set. Roll back to last known-good config if drift detected.
Platform algorithmic suppression of AI content (TikTok 2026 policy)Medium–HighMediumDepends on volume / mix decision. Option Y (40/mo, 70% real footage) is the strategist's mitigation. Option Z defers via pilot.
Patient confusion or backlash at AI-generated doctor contentMediumHighFull six-layer disclosure stack. Real signed consent. On-frame label. Active comment moderation.
Medical board scrutiny (TX) — AI doctor making health statementsLow–MediumHighOutside counsel review of release + disclosure plan before launch. Script template forbids outcome / treatment claims.
Malpractice exposure if AI says something Dr. Lam wouldn'tLowHighMandatory per-batch script approval by Dr. Lam. Prompt-level guard-rails. Insurance confirmation pre-launch.
Regulatory shift (FTC, state medical board, platform policy) mid-rolloutMediumMediumQuarterly counsel review. Pluggable layer allows fast pivots. Pilot phase tests one platform before scaling.
Comment moderation overload — hostile or confused comments at scaleMediumMediumDocumented moderation policy. On-call rotation. Pre-prepared response templates.
Hidden ops costs erode the cost advantageMediumMediumFully-loaded cost table presented to leadership up front. 90-day decision gate uses loaded TCO, not nominal.
Staff / physician morale (real doctors being AI-replicated)Low–MediumLowFrame as augmenting Dr. Lam's reach, not replacing him. Engage clinical team in approval workflow.