Decrypt5/19/2026Tech2 min read

Google Unveils Gemini Omni, a New Multimodal AI Model

artificial intelligence multimodal AI Gemini Omni

Quick Look

Google announced Gemini Omni, a new multimodal AI model combining its Gemini AI with media generation tools like Veo, Nano Banana, and Genie.
Launched at Google I/O 2026, it aims to create anything from any input, enhancing world understanding and editing capabilities.
The first release, Gemini Omni Flash, will be available via Flow and Flow Music.

AI-generated summary

Font size

Google on Tuesday introduced Gemini Omni, a new multimodal AI model that combines the company’s Gemini AI models with its media-generation tools, including Veo, Nano Banana, and Genie.

The announcement came during Google I/O 2026, where DeepMind CEO Demis Hassabis described Gemini Omni as “our new model that can create anything from any input.”

“It combines Gemini's intelligence with the best of our generative media models for a new level of world understanding, multimodality, and editing,” Hassabis said.

Google said the first release, Gemini Omni Flash, will launch through Flow, the company’s AI filmmaking platform, and Flow Music, which focuses on AI-assisted music creation.

Calling Omni a “step towards artificial general intelligence,” Hassabis said Google has spent the past year extending Gemini into “a world model AI that can understand and simulate the world.”

Google’s Omni rollout builds on the popularity of Nano Banana, the company’s earlier AI image-editing model that helped push Gemini to the top of Apple’s App Store last September. Nano Banana became widely used for meme generation and conversational image editing, briefly helping Gemini overtake ChatGPT in app downloads and Google search interest for the first time since OpenAI’s chatbot launched in 2022.

In Decrypt’s comparison earlier this month, Nano Banana 2 outperformed OpenAI’s GPT Image 2 in anime illustration and spatial composition tests, while OpenAI’s model performed better with photorealism and text rendering. Google now appears to be extending many of those editing features into video through Gemini Omni.

During the presentation, Google demonstrated Omni generating a claymation-style educational video explaining protein folding. The company also showed conversational editing tools that modified a selfie video by adding new visual elements and changing the surrounding environment.

Google says Omni can keep the same characters, backgrounds, and movement consistent even after users make changes to a video—something many AI video models struggle with. The company also says Omni uses Gemini’s reasoning abilities to understand broader instructions, so users can describe the kind of scene they want without manually explaining every detail.

The company also introduced Flow Agent, an AI assistant integrated into Google Flow that can brainstorm scenes, organize assets, recommend plot changes, and batch-edit projects.

Additional updates include Flow Tools, which allows users to create custom editing workflows using natural-language prompts without coding experience.

Hassabis said Google is starting with video generation, but plans to expand access to Omni, describing it as the long-term vision behind Gemini’s multimodal design.

“This was always our goal with Gemini, and why we built it to be multimodal from the very start,” he said.