For Clients

The Video Generation
Pipeline

From your car images to a finished marketing video — 9 automated steps, 5 AI services, fully orchestrated in the background.

9Steps

5AI Services

~3Minutes

1MP4 Output

You Provide

📸

3–4 Car Photos

JPEG or PNG, max 10 MB each
🚗

Car Brand & Model

e.g. Toyota Land Cruiser 2024
👤

Presenter Gender

Male or Female AI presenter
🌐

Language

Arabic (ar) or English (en)

You Receive

🤖

AI Presenter Video

Talking-head intro + outro segment
🎙️

Professional Voiceover

Natural TTS in your language
🎬

Cinematic Car Clip

10-second 1080p showcase video
✅

Final Merged MP4

Intro → Showcase → Outro, one file

9-Step Automated Pipeline

Every step runs automatically inside a background queue worker. You submit once and get notified when the video is ready.

Receive & Store Images

Laravel ~1s

Your uploaded car photos are securely received, validated for file type and size, then saved to a private storage folder identified by your unique job ID.

Accepts 3–4 JPEG/PNG files, max 10 MB each. MIME type is verified server-side.

Generate AI Personality

Seedream 3.0 20–40s

A photorealistic AI presenter image is generated based on your preferences — gender, car brand, model, and year — using an intelligent prompt to produce a professional-looking character.

Output: 768×1024 PNG. Stored as the presenter image for the video.

Merge Presenter + Car Image

Seedream 3.0 20–40s

The AI presenter is composited onto your car image with natural lighting and realistic placement, creating a single hero image of the presenter in front of the car.

Uses Seedream's /images/compose endpoint. The result is the base for all video segments.

Extract Car Description

OpenAI GPT-4o 5–10s

GPT-4o Vision analyses all your car photos and writes exactly 3 compelling marketing sentences describing the exterior design, key features, and driving experience — in your chosen language.

Output: 20-second voiceover script in Arabic or English. Max 300 tokens.

Generate Audio Voiceover

ElevenLabs TTS 5–15s

The car description is converted into a natural-sounding MP3 voiceover using ElevenLabs Multilingual v2 — a single audio file that will be used across both the intro and outro video segments.

One audio file is generated and reused — this prevents any audio conflict in the final video.

Generate Intro Video

OmniHuman 1.5 60–90s

The merged presenter image is animated using the voiceover audio. OmniHuman 1.5 creates a realistic talking-head video of the presenter speaking — this becomes the opening segment.

Async: result is polled every 5 seconds. Max wait: 5 minutes. Output: intro.mp4

Generate Outro Video

OmniHuman 1.5 60–90s

The same presenter image and audio are used to generate a closing talking-head segment. Using the same audio file for both intro and outro ensures perfect audio consistency throughout the final video.

Same image + same audio = no audio conflict. Output: outro.mp4

Generate Car Showcase Clip

Seedance 1.5 60–90s

Seedance 1.5 creates a cinematic 10-second 1080p promotional video of your car using all uploaded images. Smooth camera movements, showroom lighting, and an automotive advertisement aesthetic.

Duration: 10s. Resolution: 1080p. Output: showcase.mp4

Final Merge — Deliver MP4

FFmpeg 6.0+ 10–20s

All three video clips are normalised to the same resolution and frame rate, then concatenated in order: Intro → Car Showcase → Outro. The result is your final, ready-to-publish marketing video.

Scale: 1920×1080 · 30fps · libx264 / AAC. Output: final.mp4

Pipeline Complete

Your video URL is delivered via the status endpoint — ready to download or embed.

AI Services Used

Five specialised AI platforms — each responsible for a distinct stage.

Seedream 3.0

Image Generation

Generates & composites the AI presenter image.

Steps 2 & 3

OpenAI GPT-4o

Vision / NLP

Analyses car images and writes the voiceover script.

Step 4

ElevenLabs

Text-to-Speech

Converts the script to a natural MP3 voiceover.

Step 5

OmniHuman 1.5

Talking-Head Video

Animates the presenter lip-synced to the voiceover.

Steps 6 & 7

Seedance 1.5

Cinematic Video

Creates a 10-second 1080p car showcase clip.

Step 8

FFmpeg 6.0+

Video Processing

Normalises and concatenates all clips into one MP4.

Step 9

Generation Time

Total: approximately 3–7 minutes depending on AI service load.

Steps 1 Receive & Store Images

1–3s

Steps 2–3 AI Personality & Merge

40–80s

Step 4 Car Description (GPT-4o)

5–10s

Step 5 Audio Voiceover (TTS)

5–15s

Step 6 Intro Video

60–90s

Step 7 Outro Video

60–90s

Step 8 Car Showcase Clip

60–90s

Step 9 Final Merge (FFmpeg)

10–20s

Best

~4 min

Typical

~5 min

Worst

~7 min

Ready to integrate the API?

The developer documentation has full code examples, endpoint specs, and a quick-start guide.

Open API Documentation Back to Home

The Video GenerationPipeline

9-Step Automated Pipeline

Receive & Store Images

Generate AI Personality

Merge Presenter + Car Image

Extract Car Description

Generate Audio Voiceover

Generate Intro Video

Generate Outro Video

Generate Car Showcase Clip

Final Merge — Deliver MP4

AI Services Used

Generation Time

Ready to integrate the API?

The Video Generation
Pipeline