Google's Gemini Omni Flash combines reasoning and video generation in one model. You can generate and edit videos through natural language. Available now in Gemini app and Google Flow.

Gemini Omni Flash: Google's New Model Creates Video from Any Input — Pricing and First Look

At Google I/O 2026, DeepMind CTO Koray Kavukcuoglu introduced Gemini Omni — a model family that combines Gemini's reasoning capabilities with native content generation, starting with video. The first release is Gemini Omni Flash, available now in the Gemini app, Google Flow, and YouTube Shorts.

This isn't a separate video-generation product like Veo. It's a single model that can accept images, audio, video, and text as input and produce coherent video output grounded in real-world knowledge. You can also edit existing videos through natural conversation.

What Makes Gemini Omni Different

Most video generation models (Sora, Veo, Runway) take a text prompt and produce video. Gemini Omni is different in three ways:

1. Grounded in reasoning. Because Omni sits on top of Gemini's intelligence, the video output is informed by real-world knowledge. Ask it to "show how a car engine works" and the result reflects actual mechanical understanding, not just a visual aesthetic.

2. Conversational editing. You don't need to write new prompts for each change. Every instruction builds on the last — characters stay consistent, physics holds up, and the scene remembers what came before.

3. Any input, any output (starting with video). Omni can combine images, audio, video clips, and text as input. Google has stated they will expand to image and audio output over time.

How It Works in Practice

From the official blog post and demo:

• Input: Mix of images, video clips, audio, and text instructions

• Output: Generated video that incorporates all input modalities

• Editing: Natural language commands like "change the background to a forest at sunset" — applied while maintaining character consistency

• Context: Every edit builds on the previous one, so you can iterate toward a specific result without restarting

The model is being rolled out to:

• Gemini app — for consumer creation

• Google Flow — Google's AI agent platform

• YouTube Shorts — direct video creation for Shorts creators

Pricing

At launch, Gemini Omni Flash is available through:

1. Gemini app — included with Gemini Advanced subscriptions ($19.99/month) 2. Google Flow — part of the Flow platform pricing 3. YouTube Shorts — integrated into the Shorts creation experience

For API access via Vertex AI, pricing hasn't been detailed separately yet. Based on the precedent set by Gemini models, expect Omni API pricing to be tiered by compute requirements with output tokens weighted by resolution and duration.

Comparison with Other Video Generation Models

| Model | Input Types | Editing | Reasoning-Grounded | API Available | |-------|------------|---------|-------------------|--------------| | Gemini Omni Flash | Text, Image, Video, Audio | Conversational | Yes | Coming | | Veo 2 | Text, Image | Text prompts | No | Yes | | OpenAI Sora | Text, Image | Text prompts | No | Limited | | Runway Gen-4 | Text, Image | Text prompts + controls | No | Yes |

Omni's key differentiation is that it's not "just" a video model — it's a reasoning model that can also produce video. This means the same model can analyze a scene, understand physics, and generate accordingly.

Practical Implications

For content creators and developers, Gemini Omni Flash matters because:

• Fewer tools needed. You don't need to stitch together a text model for planning, an image model for storyboarding, and a video model for output. One model does everything.

• Iteration is faster. Conversational editing means you can dial in results without re-prompting from scratch.

• YouTube integration. Direct access from YouTube Shorts creation tools means millions of creators will get access immediately.

The bigger picture is what Google is calling "the agentic Gemini era" — and Omni fits squarely into that vision. Instead of using separate tools for separate tasks, agents that use Omni can reason, plan, create, and edit within a single model.

What's Next

Google has said that Omni will eventually support image and audio output in addition to video. The "anything from any input" framing suggests this is a long-term architectural bet, not a one-off product.

For now, Gemini Omni Flash is the first taste. It's available to use through the Gemini app today.