Let your creativity shine: Is $249/month Google Gemini Ultra text-to-video tool worth it?
The AI race is wild, with no rulebook. In this world-without-rules, anyone can be a creator.
That's the exciting part. There's a not-so-exciting part (read till the end).
Enter Google’s Gemini Deep Think (version 2.5), a multimodal powerhouse, blending reasoning, reinforcement learning, text, image, and now, video capabilities.
Its latest trick, the photo-to-video feature, lets paid users (on Ultra and Pro plans) turn still images into 8-second video clips with sound, using text prompts. Google is giving its Gemini AI a new party trick: photo-to-video magic.
Launched July 10, 2025, on the Gemini web app (with mobile rollout soon after), it’s powered by Google’s Veo 3 model, which also fuels their Flow filmmaking tool.
The Ultra package also includes advances in parallel reasoning & reinforcement learning (RL) to tackle tough math and science problems.
It's being peddled as "better" than Gemini Pro, OpenAI 03 and Grok4 on several metrics (i.e mathematics, coding, reasoning and knowledge).
Just upload a picture, add a text description, and boom — MP4 video at 720p in 16:9 format. This tool, powered by Google’s Veo 3 model (dropped at their May developer conference), was first available in a paid filmmaking app, called Flow.
Now it’s built into Gemini’s chat interface, helping Google keep up with big players like OpenAI, Runway AI, and even China’s Alibaba, Manus, and Kuaishou, who’ve all been flexing new video tools.
The catch?
It’s not perfect — facial features can shift in videos, and complex prompts like breakdancing might flop, reports Bloomberg.
Google says there’s no intentional tweaking of appearances, but the tech’s still new and can be wonky with faces. It’s better at animating objects, drawings, or nature shots, and they’re working on improving it.
Still, it’s a step toward making AI video editing accessible and creative.
Nano Banana, officially Gemini 2.5 Flash Image, is Google’s top-tier image editing model, integrated into the Gemini app as of August 2025.
Nano Banana has gone viral (or, more aptly, bananas).
Why? Its ability to edit photos with natural language prompts (no Photoshop skills needed) has turned heads.
Think changing outfits, swapping backgrounds, or blending images while keeping faces consistent.
It’s fast (1-2 seconds per edit), reliable, and reportedly tops LMArena’s image-editing leaderboard. You can use it free (100 edits/day) or via paid plans like Gemini API or Vertex AI for enterprise.
It’s also watermark-free on apps like Imogen, unlike Gemini’s branded outputs.
The model is available through the Gemini API for developers or Vertex AI for enterprises.
Ease of use: Both tools let you edit images or create videos with simple text prompts, no technical expertise required. Want a cat talking or a selfie in a mediaeval sari? Done in seconds.
Creative freedom: Nano Banana’s multi-turn editing remembers past changes, letting you refine images iteratively (e.g., add a bookshelf, then a couch). The photo-to-video tool animates everyday objects or nature shots, sparking endless creative possibilities.
Time and cost savings: Nano Banana cuts editing time from hours to minutes, saving professionals (marketers, influencers, designers) big bucks on photography or software. The video tool’s low-cost potential (e.g., $0.039/image for Gemini 2.5 Flash) undercuts rivals like OpenAI.
It's a good question. The jury is still out on where these powerful AI Video editing and tweaking will take all of us.
But, like the arrival of petrol engine cars or digital cameras, AI video and image editing is set to shake things up.
A few thoughts on how it could change things:
Democratised creativity
Just as cars democratised mobility, tools like Gemini and Nano Banana make pro-level editing accessible to anyone with a phone, levelling the playing field for small creators, businesses, or hobbyists. Expect a flood of personalised ads, social media content, and indie films. + Hollywood eat your heart out!
Industry disruption:
E-commerce and marketing are already using Nano Banana to generate product visuals or campaign mock-ups, slashing costs. Video tools could streamline film pre-production or create hyper-realistic virtual sets, challenging traditional workflows.
As AI gets better at mimicking reality, risks like deepfakes or misinformation grow.
Google’s got guardrails: no videos of celebs, presidents, or well-known CEOs, and nothing promoting violence or bullying.
SynthID watermarks also aim to curb misuse, but the tech’s accessibility could spark debates over authenticity and regulation.
The AI race is wild, with Google, OpenAI, and others pushing boundaries. Gemini and Nano Banana signal a future where anyone can be a creator, but the world will need to grapple with the blurry line between real and fake.
Feature/Aspect | Gemini AI (Google) | ChatGPT (OpenAI) | Perplexity | DeepSeek AI | Claude (Anthropic) | Grok (xAI) |
---|---|---|---|---|---|---|
Primary Focus | Multimodal tasks, productivity, Google ecosystem integration | General-purpose conversation, creative content, coding | Real-time fact retrieval, academic research | Code generation, technical tasks, cost-effective NLP | Safe, reliable responses, long-form reasoning | Real-time answers, witty tone, X platform integration |
Multimodal Capabilities | Strong: Text, images, video (Veo 3), audio; excels in Google Docs, Gmail integration | Strong: Text, images, data analysis; supports image generation and file processing | Moderate: Text, images; video via Veo 3; focused on search-based outputs | Limited: Primarily text, some code generation | Moderate: Text, some image support via plugins; no native video generation | Moderate: Text, images via Aurora; no native video generation |
Real-Time Data Access | Yes, via Google Search integration; fast for web-based queries | Yes, with Search and Deep Research; less consistent for recent data | Yes, excels in real-time web search with citations | Limited: Relies on training data, no native real-time search | Limited: Supports web searches via plugins, not as seamless | Yes, strong real-time access via X platform integration |
Research Capabilities | Deep Research mode for multi-step tasks; less accurate than Perplexity (6.2% on Humanity’s Last Exam) | Deep Research mode; high accuracy (26.6% on Humanity’s Last Exam) | Deep Research mode; fast (2-4 min), accurate (21.1% on Humanity’s Last Exam) | Strong for technical research (e.g., STEM); no dedicated Deep Research mode | Strong for long-form analysis, but no dedicated real-time research mode | Real-time research with citations; less accurate (3.8% on Humanity’s Last Exam) |
Coding Performance | Good for general coding; less polished than ChatGPT | Excellent: Polished, user-friendly code (e.g., password generators with UI flair) | Good for quick code explanations, debugging; less advanced than ChatGPT | Excellent: Specialized in code generation, debugging, STEM tasks | Strong: Excels in coding with clear explanations | Moderate: Functional but less refined than ChatGPT or DeepSeek |
Creativity | High: Excels in creative writing, multimedia content generation | Very High: Best for storytelling, imaginative content | Moderate: Less suited for creative tasks, prioritizes factual accuracy | Low: Focused on technical tasks, not creative writing | High: Strong for thoughtful writing, captures user style well | Moderate: Witty but less creative than ChatGPT or Gemini |
Accuracy | Moderate: May misinterpret sources or provide outdated links | Moderate: Requires fact-checking; struggles with context retention | High: Excels in factual accuracy with extensive citations | High: Accurate for technical domains, less so for general knowledge | High: Prioritizes safe, reliable responses | Moderate: Potential bias, less accurate for complex research |
Cost/Accessibility | Free tier; Deep Research requires Gemini Advanced ($20/month) | Free tier; Deep Research requires Pro ($200/month) | Free tier (5 queries/day); Pro ($20/month) offers 500 Deep Research queries | Cost-effective; open-source options available | Subscription-based; pricing varies by model (Haiku, Sonnet, Opus) | Free tier; integrated with X, no clear pricing for advanced features |
Unique Edge | Seamless Google ecosystem integration (Docs, Gmail, Cloud); strong video generation | Versatile all-rounder; memory feature for personalized interactions | Fast, citation-rich research; accessible pricing for Deep Research | Cost-effective, open-source; excels in STEM and Chinese NLP | Safety-focused; excellent for long-form reasoning and analysis | Real-time data via X; unique, humorous tone |
Weaknesses | Less accurate for research; subscription barrier for advanced features | Inconsistent real-time data; high cost for Deep Research | Limited creativity; reliant on web sources streaming: | Limited multimodal capabilities; niche focus on technical tasks | Limited real-time search; less versatile for casual tasks | Less mature, still catching up to competitors; potential bias |
Best Use Cases | Productivity, multimodal apps, Google Cloud users | Creative writing, coding, general-purpose tasks | Academic research, fact-checking, quick data retrieval | Technical research, coding, cost-sensitive applications | Complex reasoning, safe content generation | Real-time Q&A, casual conversations, X-based research |
Sign up for the Daily Briefing
Get the latest news and updates straight to your inbox