industryModels & insights9 min read

Journal · Models & insights

Gemini Omni leak: what Google's video model could mean

Ahead of Google I/O 2026, leaks point to a Gemini Omni video model. What's confirmed, what's still rumor, and what OmniArt creators should do this week.

OmniArt Team·2026-05-13

Google I/O 2026 lands on May 19–20, and the AI video corner of the internet is already pre-living the keynote. The reason is a single UI string spotted inside Gemini's video tab: "Start with an idea or try a template. Powered by Omni." From that one line, three waves of leaks have built a working picture of an unannounced Google video model — provisionally called Gemini Omni — that could either replace Veo 3.1, sit beside it, or quietly upgrade Google's entire generative stack.

This piece is the read for OmniArt creators trying to decide what — if anything — to do about it before Tuesday. We separate confirmed signals from speculation, walk through the three plausible identities for Omni, and end with the practical move for creators who need to ship video this week.

What we actually know (and don't)

Signal	Status	What it means
UI string "Powered by Omni" in Gemini's video tab	Confirmed in screenshots	A product called Omni is staged for release behind a feature flag
Model ID `bard_eac_video_generation_omni`	Reported via app inspection	An internal identifier is plumbed through Gemini's video pipeline
10-second clip limit	Reported by early testers	Suggests early-stage or consumer-tier constraint, not API tier
"Remix your videos, edit directly in chat, try a template"	Reported feature copy	Edit-and-remix workflows, not generate-only
Strong text coherence (e.g. math equations)	Reported in demo coverage	Notable technical advancement for in-video typography
Native audio	Not confirmed	Veo 3.1 ships native audio; Omni's status is unclear
API access	Not confirmed	Developers shouldn't plan around unconfirmed availability
Replaces, supplements, or rebrands Veo 3.1	Open question	The most important question for production teams

The honest summary: a Google video product called Omni is real enough to ship UI copy, but every architectural claim about it is still inference from app strings and tester reports.

The three plausible identities

Most of the uncertainty collapses into three scenarios for what Omni actually is. Each has different implications for the lineup of AI video tools creators rely on.

Scenario 1 — Consumer rebrand of Veo

The simplest reading: Omni is the consumer-facing replacement for "Veo" branding inside Gemini, similar to how Google consolidated its image generation behind "Nano Banana." Veo remains the underlying engine; Omni is the surface most users see.

If true, expect: minimal capability changes versus Veo 3.1, the same 8–10 second limits at the consumer tier, and Veo continuing on the enterprise/API track.

Scenario 2 — A Gemini-native video model

A second reading: Omni is a version of the Gemini architecture fine-tuned specifically for video, running parallel to the Veo track. Veo stays the dedicated video model for the API and enterprise; Omni is the consumer model that benefits from Gemini's text and reasoning ability.

If true, expect: stronger prompt adherence, better in-video typography (the math-equation reports support this), and tighter integration with Gemini's chat-based editing.

The most ambitious reading: Omni is a single unified system that generates text, images, video, and audio natively from one model. The name itself ("Omni") suggests this scenario is the one Google is positioning toward, even if the launch lands shy of full parity.

If true, expect: meaningful workflow shifts toward conversational editing, multi-modal handoffs inside chat, and a longer-term challenge to the model-per-modality stack the rest of the field uses.

The most likely outcome at I/O is some blend of scenarios 2 and 3 — a Gemini-native video model with omni-modal ambitions but consumer-tier limits at launch.

Why the reported features matter

Three of the reported features deserve more attention than the model identity question itself, because they signal where the AI video category is heading regardless of who ships them first.

Conversational editing as the default

"Remix your videos, edit directly in chat" is the part of the leak that changes the workflow conversation. Most AI video tools today are still generate-and-download — you prompt, you wait, you save the clip, you re-prompt for changes. Chat-based editing reframes the model as a continuous collaborator: "make the second shot warmer," "swap the background," "extend by three seconds." If Omni ships this competently, it pressures every other model to match.

Templates as the on-ramp

Templates lower the prompt-engineering barrier for new users — a real benefit. They also flatten output diversity when everyone starts from the same shared prompt. The interesting question isn't whether templates ship, but whether they meaningfully outperform a well-written brief from scratch.

Text inside video

Reports of math equations rendering cleanly inside generated video are technically notable. In-video typography has been the visible weakness of every major model. If Omni handles complex typography reliably, that opens explainer video, education, and motion-graphics workflows that previously needed a compositing pass.

How Omni would slot into the lineup

For creators who already work across multiple AI video models, the relevant question is where Omni fits, not whether it wins. The shape of the answer based on reported features:

Capability	Gemini Omni (reported)	Veo 3.1 (confirmed)	V6 / R1	Sora 2
Duration	10s (reported)	Up to 8s	1–15s	Up to 20s
Resolution	Unknown	Up to 1080p	Up to 1080p	1080p, 4K available
Native audio	Not confirmed	Confirmed	Included	Included
Editing / remix	Reported: remix, chat, templates	Limited	Modify, Extend, multi-clip	Limited
API access	Not confirmed	Available	Available	Available
Strongest at	Conversational editing (reported)	Native 4K, spatial audio	Cinematic control, real-time	Long single takes

If the leaked feature set holds, Omni's lane is "conversational consumer video" — a sweet spot for quick social work and chat-driven iteration. The cinematic, broadcast, and multi-shot lanes stay with their current leaders until evidence says otherwise.

What this means for creators this week

The temptation with a pre-announcement leak is to wait. We'd push back on that for anyone with a deliverable in the next ten days.

Warning

Treat every Omni feature in the press as pre-announcement signal, not confirmed capability. Plans built on reported specs survive the keynote about half the time.

The practical move depends on what you're shipping.

If you have video due this week

Use what's live and proven. V6 for cinematic shots, Veo 3.1 for native-4K broadcast cuts, Kling 3.0 for multilingual social variants, HappyHorse 1.0 for fast iteration. Inside OmniArt those are all one click apart, so you don't need to commit to any single tool ahead of the keynote.

If you're planning Q3 production

Build the brief around capabilities, not brands. Document what you actually need — duration, resolution, audio, editing model, character lock — and let the post-I/O lineup re-bid for the work in two weeks. If Omni ships and delivers, the brief plugs into it without rewriting the rest of the pipeline.

If you're researching and learning

Watch the keynote. Save tests, not opinions. The most valuable thing you can have post-launch is an apples-to-apples comparison run — same brief, same references, same evaluation rubric — across whatever ships, Veo 3.1, and the established lineup.

The bigger shift Omni signals

Whatever Omni turns out to be, the leaks tell a clearer story about the category than they do about Google specifically.

The competitive surface is moving. First-pass visual quality is converging across the leaders. The real differentiation is shifting toward controllability, multi-shot consistency, audio-visual sync, conversational editing, and how well a model fits a real workflow — not which model wins a benchmark.

Costs are still real. The repeated reports of usage limits and consumption tabs in Omni's UI confirm that high-fidelity video generation remains computationally expensive at scale. Templates and short clip caps are partly UX and partly economics.

Rights and remix get harder. Remix workflows on top of generated video introduce IP, consent, and commercial-use questions that text-to-video flows don't fully surface. Any team putting remix-based output into paid media should have the rights checklist ready before the feature ships.

How OmniArt plans to handle it

The OmniArt workspace adds models when they meet two bars: stable public availability and a real creative job that the existing lineup doesn't already cover. Gemini Omni, if and when it lands, will be evaluated against both.

If Omni ships at I/O and clears the bar, expect it in the workspace alongside Veo 3.1, Sora 2, V6, Kling 3.0, HappyHorse 1.0, Seedance 2.0, Runway Gen-4.5, Hailuo, and Grok Imagine — one prompt grammar, one balance, one place to compare it against the rest.

For background on the current video lineup, see the OmniArt video models tour. For how to write briefs that port cleanly across whichever model ends up running them, see the prompt-writing guide.

FAQ

Is Gemini Omni officially announced?

No. As of May 13, 2026, Google has not announced Gemini Omni. The product name, model ID, and feature copy come from app UI strings and reports from early testers. Google I/O 2026 (May 19–20) is the likely announcement window.

Will Gemini Omni replace Veo 3.1?

It's unclear. The three plausible scenarios are: Omni rebrands Veo for consumer surfaces, Omni runs alongside Veo as a Gemini-native consumer model, or Omni is a true omni-modal unified system. A blend of the second and third is most likely at launch.

What features are reported for Gemini Omni?

Reported features include conversational editing inside Gemini chat, a remix workflow, prompt templates, strong text coherence inside video (math equations rendered cleanly), and a 10-second clip limit. None of these are officially confirmed.

Should I wait for Omni before producing video this week?

No. Use the models that are live and stable today. The lineup already covers cinematic shots, native-4K broadcast, multilingual social, fast iteration, multi-shot continuity, and frame-level VFX. If Omni ships and clears the bar, you can swap it in without rewriting the rest of the pipeline.

How does Omni compare to Veo 3.1?

Based on reported specs, Omni's edge is conversational editing and possibly in-video typography; Veo 3.1's confirmed strengths are native audio and 4K output. Direct comparison isn't possible until Omni is publicly available.

Start creating

Ready to Create?

Start generating amazing content with AI