For Clients

The Video Generation
Pipeline

From your car images to a finished marketing video — 9 automated steps, 5 AI services, fully orchestrated in the background.

9Steps
5AI Services
~3Minutes
1MP4 Output
You Provide
  • 📸
    3–4 Car Photos
    JPEG or PNG, max 10 MB each
  • 🚗
    Car Brand & Model
    e.g. Toyota Land Cruiser 2024
  • 👤
    Presenter Gender
    Male or Female AI presenter
  • 🌐
    Language
    Arabic (ar) or English (en)
You Receive
  • 🤖
    AI Presenter Video
    Talking-head intro + outro segment
  • 🎙️
    Professional Voiceover
    Natural TTS in your language
  • 🎬
    Cinematic Car Clip
    10-second 1080p showcase video
  • Final Merged MP4
    Intro → Showcase → Outro, one file

9-Step Automated Pipeline

Every step runs automatically inside a background queue worker. You submit once and get notified when the video is ready.

1

Receive & Store Images

Laravel ~1s

Your uploaded car photos are securely received, validated for file type and size, then saved to a private storage folder identified by your unique job ID.

Accepts 3–4 JPEG/PNG files, max 10 MB each. MIME type is verified server-side.
2

Generate AI Personality

Seedream 3.0 20–40s

A photorealistic AI presenter image is generated based on your preferences — gender, car brand, model, and year — using an intelligent prompt to produce a professional-looking character.

Output: 768×1024 PNG. Stored as the presenter image for the video.
3

Merge Presenter + Car Image

Seedream 3.0 20–40s

The AI presenter is composited onto your car image with natural lighting and realistic placement, creating a single hero image of the presenter in front of the car.

Uses Seedream's /images/compose endpoint. The result is the base for all video segments.
4

Extract Car Description

OpenAI GPT-4o 5–10s

GPT-4o Vision analyses all your car photos and writes exactly 3 compelling marketing sentences describing the exterior design, key features, and driving experience — in your chosen language.

Output: 20-second voiceover script in Arabic or English. Max 300 tokens.
5

Generate Audio Voiceover

ElevenLabs TTS 5–15s

The car description is converted into a natural-sounding MP3 voiceover using ElevenLabs Multilingual v2 — a single audio file that will be used across both the intro and outro video segments.

One audio file is generated and reused — this prevents any audio conflict in the final video.
6

Generate Intro Video

OmniHuman 1.5 60–90s

The merged presenter image is animated using the voiceover audio. OmniHuman 1.5 creates a realistic talking-head video of the presenter speaking — this becomes the opening segment.

Async: result is polled every 5 seconds. Max wait: 5 minutes. Output: intro.mp4
7

Generate Outro Video

OmniHuman 1.5 60–90s

The same presenter image and audio are used to generate a closing talking-head segment. Using the same audio file for both intro and outro ensures perfect audio consistency throughout the final video.

Same image + same audio = no audio conflict. Output: outro.mp4
8

Generate Car Showcase Clip

Seedance 1.5 60–90s

Seedance 1.5 creates a cinematic 10-second 1080p promotional video of your car using all uploaded images. Smooth camera movements, showroom lighting, and an automotive advertisement aesthetic.

Duration: 10s. Resolution: 1080p. Output: showcase.mp4
9

Final Merge — Deliver MP4

FFmpeg 6.0+ 10–20s

All three video clips are normalised to the same resolution and frame rate, then concatenated in order: Intro → Car Showcase → Outro. The result is your final, ready-to-publish marketing video.

Scale: 1920×1080 · 30fps · libx264 / AAC. Output: final.mp4
Pipeline Complete
Your video URL is delivered via the status endpoint — ready to download or embed.

AI Services Used

Five specialised AI platforms — each responsible for a distinct stage.

Seedream 3.0
Image Generation

Generates & composites the AI presenter image.

Steps 2 & 3
OpenAI GPT-4o
Vision / NLP

Analyses car images and writes the voiceover script.

Step 4
ElevenLabs
Text-to-Speech

Converts the script to a natural MP3 voiceover.

Step 5
OmniHuman 1.5
Talking-Head Video

Animates the presenter lip-synced to the voiceover.

Steps 6 & 7
Seedance 1.5
Cinematic Video

Creates a 10-second 1080p car showcase clip.

Step 8
FFmpeg 6.0+
Video Processing

Normalises and concatenates all clips into one MP4.

Step 9

Generation Time

Total: approximately 3–7 minutes depending on AI service load.

Steps 1 Receive & Store Images
1–3s
Steps 2–3 AI Personality & Merge
40–80s
Step 4 Car Description (GPT-4o)
5–10s
Step 5 Audio Voiceover (TTS)
5–15s
Step 6 Intro Video
60–90s
Step 7 Outro Video
60–90s
Step 8 Car Showcase Clip
60–90s
Step 9 Final Merge (FFmpeg)
10–20s
Best
~4 min
Typical
~5 min
Worst
~7 min

Ready to integrate the API?

The developer documentation has full code examples, endpoint specs, and a quick-start guide.