The Video Generation
Pipeline
From your car images to a finished marketing video — 9 automated steps, 5 AI services, fully orchestrated in the background.
-
📸
3–4 Car PhotosJPEG or PNG, max 10 MB each
-
🚗
Car Brand & Modele.g. Toyota Land Cruiser 2024
-
👤
Presenter GenderMale or Female AI presenter
-
🌐
LanguageArabic (ar) or English (en)
-
🤖
AI Presenter VideoTalking-head intro + outro segment
-
🎙️
Professional VoiceoverNatural TTS in your language
-
🎬
Cinematic Car Clip10-second 1080p showcase video
-
✅
Final Merged MP4Intro → Showcase → Outro, one file
9-Step Automated Pipeline
Every step runs automatically inside a background queue worker. You submit once and get notified when the video is ready.
Receive & Store Images
Your uploaded car photos are securely received, validated for file type and size, then saved to a private storage folder identified by your unique job ID.
Generate AI Personality
A photorealistic AI presenter image is generated based on your preferences — gender, car brand, model, and year — using an intelligent prompt to produce a professional-looking character.
Merge Presenter + Car Image
The AI presenter is composited onto your car image with natural lighting and realistic placement, creating a single hero image of the presenter in front of the car.
Extract Car Description
GPT-4o Vision analyses all your car photos and writes exactly 3 compelling marketing sentences describing the exterior design, key features, and driving experience — in your chosen language.
Generate Audio Voiceover
The car description is converted into a natural-sounding MP3 voiceover using ElevenLabs Multilingual v2 — a single audio file that will be used across both the intro and outro video segments.
Generate Intro Video
The merged presenter image is animated using the voiceover audio. OmniHuman 1.5 creates a realistic talking-head video of the presenter speaking — this becomes the opening segment.
Generate Outro Video
The same presenter image and audio are used to generate a closing talking-head segment. Using the same audio file for both intro and outro ensures perfect audio consistency throughout the final video.
Generate Car Showcase Clip
Seedance 1.5 creates a cinematic 10-second 1080p promotional video of your car using all uploaded images. Smooth camera movements, showroom lighting, and an automotive advertisement aesthetic.
Final Merge — Deliver MP4
All three video clips are normalised to the same resolution and frame rate, then concatenated in order: Intro → Car Showcase → Outro. The result is your final, ready-to-publish marketing video.
AI Services Used
Five specialised AI platforms — each responsible for a distinct stage.
Generates & composites the AI presenter image.
Steps 2 & 3Analyses car images and writes the voiceover script.
Step 4Converts the script to a natural MP3 voiceover.
Step 5Animates the presenter lip-synced to the voiceover.
Steps 6 & 7Creates a 10-second 1080p car showcase clip.
Step 8Normalises and concatenates all clips into one MP4.
Step 9Generation Time
Total: approximately 3–7 minutes depending on AI service load.
Ready to integrate the API?
The developer documentation has full code examples, endpoint specs, and a quick-start guide.