Veo 3 can generate videos – and soundtracks

Google's latest video-generated AI model VEO 3 can create audio together with the clips it generates.

At the Google I/O 2025 developer conference on Tuesday, Google announced VEO 3, which the company claims can produce sound effects, background noise and even conversations to accompany the videos it creates. Google says VEO 3 can also improve its predecessor, VEO 2, based on the quality of the lens it can produce.

VEO 3 starts on Google's Google Chatbot app starting Tuesday and is available for Google's $249.99-AI Ultra Plan Plan, with text or image prompts.

"This is the first time we've stood out from the era of video power generation," Demis Hassabis, CEO of Google DeepMind, said in a press conference. "(You can give VEO 3) tips that describe the character and the environment, and bring up conversations and describe the sound you want it to be."

The wide availability of tools to build video generators has led to an explosion of providers that the space becomes saturated. Startups including Runway, Lightricks, Genmo, Pika, Higgsfield, Kling and Luma, as well as tech giants like OpenAI and Alibaba are releasing models in a quick clip. In many cases, few are differentiated from one model.

If Google can deliver on its promises, audio output is a big differentiator for VEO 3. AI-powered sound generation tools are not novel, nor are they models for creating video effects. However, according to Google, Weo 3 can understand the original pixels from its videos and can automatically sync the sound with the clip.

Here is a sample clip of the model:

DeepMind's early work in "Video to ADIO" AI could make VEO 3 possible. Last June, DeepMind revealed that it is developing AI Tech to create a soundtrack for videos by training the sound and dialogue transcripts of models and video clips.

DeepMind won't say exactly what it trains VEO 3, but YouTube is a powerful possibility. Google owns YouTube, DeepMind previously told TechCrunch that Google models (such as weo May” (such as certain YouTube materials) were trained.

To mitigate deep risks, DeepMind says it is embedding invisible markers into the framework VEO 3 generation using its proprietary watermarking technology SynthID.

While companies like Google Pitch Weo 3 serve as powerful creative tools, many artists are on guard against them - they threaten to disrupt the entire industry. A 2024 study commissioned by the Animation Guild, an union representing Hollywood animators and cartoonists, estimates that AI will be interrupted by AI by 2026.

Google also today launched new features for VEO 2, including a feature that allows users to provide model images of characters, scenes, objects and styles for improved consistency. The latest VEO 2 can understand the camera's actions, such as rotation, miniature and zoom, which allows users to add or remove objects from videos, or widen the range of clip frames to, for example, turn them from portraits to landscapes.

Google says all these new VEO 2 features will be available to its Vertex AI API platform in the coming weeks.