Google has introduced Gemini Omni, a new AI model unveiled at its annual I/O developer conference on May 20, 2026. The model is designed to generate and edit videos from various input formats, including text, images, and audio, using natural language instructions.
According to Google's official announcement, Gemini Omni extends the capabilities of the existing Gemini family beyond text and image generation into video creation. Users can describe a scene or provide a rough clip, and the model can produce or modify video content accordingly.
The model supports conversational editing, allowing users to make iterative changes to videos through dialogue. For example, a user could ask to change the background or add an object, and the AI would adjust the video in real time.
Google has not yet specified a public release date for Gemini Omni, stating that it is currently in a limited testing phase. The company emphasized safety measures, including watermarking and content filters, to prevent misuse.