How to Use the Veo API for AI Video Generation in 2026

Get started with the Veo API on Segmind: real Veo 3 and Veo 3 Fast generations, exact code, and pricing across four production use cases.

Veo 3 API quickstart — Segmind brand illustration

I started 2026 expecting AI video to plateau. The opposite happened. Search interest in “AI video generation” has risen sharply over the last quarter, and the models we host on Segmind have shifted from novelty to actual production tools. 

The clearest example: Veo 3 from Google DeepMind. 

Native synchronized audio, cinematic motion, and a clean HTTP API mean a developer can go from a prompt to a finished, shot-ready clip in roughly 1.5 minutes. No GPU to provision, no diffusion knobs to tune.

This post is the version of “getting started with the Veo API” I wish I had when I first wired it into a side project. I ran four real generations across the use cases I keep getting asked about by founders building marketing tools, film studios looking at pre-vis, and content houses publishing at MCN scale. I will show you the exact requests, what came back, what the cost was, and where I would not use it.

So, are you ready to build with Veo 3? Explore the Veo API on Segmind and start generating cinematic AI videos today. 

TL;DR

  • Production Ready: The Veo API is useful when AI video needs to move beyond experiments and into real workflows for ads, pre-vis, product visuals, and short-form content.
  • Audio Included: Veo 3 can generate video and synchronized audio together, which reduces the need for separate tools for voice, music, or sound design in early drafts.
  • Model Choice: Use Veo 3 when quality and cinematic output matter; use Veo 3 Fast when speed, lower cost, and high-volume iteration matter more.
  • Workflow Fit: The strongest use cases include marketing ad variants, film shot ideation, MCN-scale short-form videos, and developer-led video generation within products.
  • Practical Limits: Veo 3 is strong for hero clips and standalone videos, but outputs still need review when the workflow requires strict character continuity, exact text, or brand-sensitive polish.

What Is the Veo 3 API? 

Veo 3 is Google DeepMind’s text-to-video model, available on Segmind through a serverless API endpoint. You POST a JSON payload with a prompt and optional parameters. Depending on the endpoint response format, either save the returned video binary or extract the video URL from the JSON response.

The model supports synchronized audio generation, so you can generate video and audio together instead of building a separate sound-design pipeline. The API is shown as a direct POST request. There is no polling, no job queue to babysit, and no separate audio pipeline.

It comes in two flavors that share the same API shape:

Both accept the same parameters: prompt (required), optional image_url for image-to-video, duration (4, 6, or 8 seconds), aspect_ratio (16:9, 4:3, 1:1, 3:4, 9:16), resolution (720p for standard HD and 1080p for full HD), fps, generate_audio, and seed. The default settings are good enough for most early prototypes, so you can start with just a prompt and add parameters later. 

Want to learn more about building with the Veo API? Explore Veo 3 and Veo 3 Fast on Segmind to generate your first AI video!

Why the Veo API Matters for AI Video Generation in 2026 

Two things are true about the Veo API right now that were not true six months ago. 

First, latency. Generating a single 8-second clip takes less time than before. That is finally fast enough to live inside an interactive editor without breaking flow. 

Second, audio. Most of the cost in a "produced" video is the sound design pass: voiceover, foley, ambient bed, score. Veo 3 ships those bundled. You enable generate_audio, and the returned MP4 already has a synchronized track. 

In many use cases, that single feature collapses a four-tool pipeline into a single API call.

Use case 1: Creating Ad Variants for Marketing Agencies 

A typical performance-marketing agency ships 30 to 80 ad variants per week per client. Most of that work is iteration on the same product hero shot with different settings, music beds, and voiceover lines. 

Until recently, that meant a junior creative spending a full day in After Effects per variant. With Veo 3, you can describe the spot in a paragraph and get back a finished hero clip with audio in 1-1.5 minutes.

Here is the exact request I ran for a luxury skincare ad variant:

Prompt used A 8-second luxury skincare ad. Macro shot of a glass dropper releasing a single golden serum drop into a frosted glass dish, slow motion at 120fps. Soft warm window light from the left, marble vanity surface, eucalyptus sprig in soft focus background. The drop hits the surface, ripples expand. Camera slowly pulls back to reveal the product bottle: amber glass, minimalist black label reading 'AURELIE'. Ambient piano keys with a subtle hi-hat layer. Cinematic, premium beauty commercial aesthetic, 16:9, 4K.

Parameters duration: 8  |  aspect_ratio: 16:9  |  resolution: 1080p  |  generate_audio: true  |  fps: 24

Veo 3 output, 8s with synchronized audio. Cost: $3.20 per generation.

And here is the call that produced it:

import requests

response = requests.post(
    "https://api.segmind.com/v1/veo-3",
    headers={"x-api-key": "YOUR_API_KEY"},
    json={
        "prompt": "A 15-second luxury skincare ad. Macro shot of a glass dropper...",
        "duration": "8",
        "aspect_ratio": "16:9",
        "resolution": "1080p",
        "generate_audio": True,
        "fps": 24
    }
)

with open("ad-variant.mp4", "wb") as f:
    f.write(response.content)

The interesting thing for an agency is the unit economics. Per 8-second clip with audio, generating 50 ad variants in VEO 3 costs $160. The salary cost of a junior creative producing the same volume is roughly two orders of magnitude higher. 

The product slot, the lighting, and the audio bed all came from a single string. The brand can stay consistent by pinning a seed value once a variant lands and re-rolling adjacent prompts off the same seed.

Use case 2: Film Pre-Visualization and Shot Ideation 

A VFX supervisor I talked to last month described pre-vis as "the most important wasted budget in our pipeline." You spend a week of an animator's time blocking shots that get thrown away the moment the director walks on set. Veo 3 collapses that into an afternoon. You hand the model a script line and a camera direction, and it returns a watchable cinematic clip you can use as a planning artifact.

This is where Veo 3 (not Fast) earns its price. The full model handles film grain, anamorphic-feeling lens distortion, and motivated lighting in ways the cheaper variant cannot match. Here is a noir establishing shot I ran for a fictional Tokyo thriller:

Prompt used A cinematic establishing shot of a lone figure in a long charcoal coat walking down a rain-slicked Tokyo alley at 2am. Neon signs reflect in puddles, steam rises from a ramen vendor's cart in the foreground, the figure's silhouette is backlit. Slow dolly-in, 35mm anamorphic lens feel, cyan and magenta color grade, light rain falling. Distant city hum, footsteps on wet asphalt, a low-frequency synth drone. Blade Runner aesthetic, 16:9, cinematic.

Parameters duration: 8  |  aspect_ratio: 16:9  |  resolution: 1080p  |  generate_audio: true  |  fps: 24

Veo 3 cinematic establishing shot. The footsteps and ambient drone are generated, not added in post.

For a studio, the interesting workflow is what you do with the clip after it lands. Pin the seed, vary the prompt slightly to explore alternate framings, and you get a ten-shot mood reel by lunch. Production designers can hand the reel to the director without having to schedule a single concept artist. 

Where Veo 3 still falls short is character continuity across cuts. If you need the same person across shots, you pair Veo 3 with a separate identity-preserving image-to-video pass, which the model supports via the image_url parameter.

Use case 3: Short-Form Video Production for MCNs and Content Teams 

Multi-channel networks running 200 to 1000 short-form videos a month are the loudest customer segment for AI video right now. The economics are punishing: a $30 per-video editor cost on a thousand videos is a $30,000 line item. Vertical 9:16 output with native audio drops that to a fraction. This is the use case where Veo 3 Fast pays for itself, not the full model.

A vertical sourdough recipe short, generated in one call:

Prompt used A vertical 9:16 YouTube Short style clip. A friendly home cook in a sunlit kitchen lifts a freshly baked sourdough loaf out of a Dutch oven, steam rising. Quick camera push-in on the cracked golden crust, then a clean overhead shot of the loaf being scored. Bright kitchen, white tile backsplash, plants on the windowsill. Upbeat acoustic guitar with hand claps. Energetic, social-friendly, hook-heavy first 2 seconds.

Parameters endpoint: veo-3-fast  |  duration: 8  |  aspect_ratio: 9:16  |  resolution: 1080p  |  generate_audio: true

Veo 3 Fast at 9:16, $1.20 per generation. The acoustic guitar bed is generated.

In Veo 3 Fast, an 8-second vertical clip with audio costs $1.2. An MCN producing 1000 videos a month is looking at $1,200 in inference costs. That is below the cost of one editor, and the throughput is effectively unlimited. The right architecture is a queue worker that fans out prompts, calls the API, drops the MP4, and hands it to a thumbnailing or captioning step. There is no async polling to manage because the response itself returns the binary.

How to Make Your First Veo API Request 

Here is the smallest possible working call. It will return an MP4 you can write to disk and play immediately.

import requests

response = requests.post(
    "https://api.segmind.com/v1/veo-3-fast",
    headers={"x-api-key": "YOUR_API_KEY"},
    json={"prompt": "A clean product visualization: a minimalist purple and white SaaS dashboard appears on a floating MacBook Pro screen, rotating slowly 360 degrees on a soft gradient background"}
)

response.raise_for_status()
with open("output.mp4", "wb") as f:
    f.write(response.content)

Veo 3 Fast, 4 seconds, no audio. The cheapest end of the price curve at $0.40.

Veo API Pricing: How Much Does Veo 3 & Veo 3 Fast Cost?

Per the published rate card, here is what you actually pay per call for Veo3 and VEO 3 Fast:

Model 

Duration 

Audio 

Cost per call

Veo 3 

4s 

off 

$0.8 

Veo 3 

4s 

on 

$1.6 

Veo 3 

6s 

off 

$1.2 

Veo 3 

6s 

on 

$2.4 

Veo 3 

8s 

off 

$1.6 

Veo 3 

8s 

on 

$3.2 

Veo 3 Fast 

4s 

off 

$0.4 

Veo 3 Fast 

4s 

on 

$0.6 

Veo 3 Fast 

6s 

off 

$0.6 

Veo 3 Fast 

6s 

on 

$0.9 

Veo 3 Fast 

8s 

off 

$0.8 

Veo 3 Fast 

8s 

on 

$1.2 

For my four-clip test run in this post, I spent exactly $8.00 across the matrix. Audio more than doubles the cost on the full model and adds about 50% on Fast, so if you are iterating on visuals only, set generate_audio: false until the shot is locked.

Want to learn more about Veo API pricing? Compare Veo 3 and Veo 3 Fast on Segmind, and choose the right model for your video workflow!

Veo API Strengths and Limitations

What Veo 3 does well: 

Veo 3 is strong for cinematic text-to-video generation, lifelike motion, and synchronized audiovisual output. The biggest practical win is that it can generate video and audio together, so teams do not always need a separate sound-design step for early drafts, ad concepts, or short cinematic clips. 

Where it falls short: 

Character continuity across multiple generations is unreliable without an image_url anchor. If you need strict character continuity across a 30-shot sequence, this model is not for you. If you need a hero clip or a stand-alone short, it is.

FAQs

What is the Veo API used for? 

The Veo API generates short cinematic videos (4 to 8 seconds) with synchronized audio from a text prompt. It is used for ad variants, pre-visualization, social-format shorts, and product visualization.

How do I use the Veo 3 API? 

Make a POST request to https://api.segmind.com/v1/veo-3 with your x-api-key header and a JSON body containing at minimum a prompt. The response body is an MP4 binary. Save it directly to a file.

How much does the Veo API cost? 

Veo 3 ranges from $0.8 (4 seconds, no audio) to $3.2 (8 seconds with audio). Veo 3 Fast ranges from $0.4 to $1.2 across the same matrix. You only pay for successful generations.

What is the difference between Veo 3 and Veo 3 Fast? 

Same API shape, same parameters. Veo 3 is the full quality model, best for cinematic work. Veo 3 Fast is cheaper per second, best for iteration and short-form social content where pixel-perfect quality is not the goal.

Can the Veo 3 API generate vertical videos for shorts? 

Yes. Set aspect_ratio: "9:16" in the request body. Other supported ratios are 16:9, 4:3, 1:1, and 3:4.

Can I convert an image to a video with the Veo API? 

Yes. Pass a publicly accessible URL to the optional image_url parameter, and the model will animate forward from that frame. Both veo-3 and veo-3-fast endpoints support this.

Conclusion

The Veo API works best when you treat it as a practical production layer, not just a demo model. The real value is not only that it generates short AI videos, but also that it brings prompt-based video, synchronized audio, aspect-ratio control, and API access into a single workflow.

The teams that get the most out of it will not use Veo 3 for every video task. They will use Veo 3 when quality matters, Veo 3 Fast when speed and iteration matter, and human review when the output needs brand polish or strict creative control. 

Sign up for Segmind and start using the Veo API today to create cinematic AI videos with synchronized audio from a simple prompt!