Google GeminiAI and Nano Banana: What you need to know

Let your creativity shine: Is $249/month Google Gemini Ultra text-to-video tool worth it?

Last updated: September 04, 2025 | 13:43

Jay Hilotin, Senior Assistant Editor

4 MIN READ

Gemini Nano Banana: Tiny tech, mighty flavour.

X | @FutureStacked

The AI race is wild, with no rulebook. In this world-without-rules, anyone can be a creator.

That's the exciting part. There's a not-so-exciting part (read till the end).

Also In This Package

OpenAI strikes back with Deep Research

Google's hyper-realistic AI video tool hits Middle East

Veo 3: Google’s next-gen AI video tool launches on Gemini across Middle East

EV and AI feel heat of global power plays

Now, electric vehicles and AI are targets in global tug of wars

How AI tools are changing the teaching profession

How ChatGPT and other AI tools are changing the teaching profession

Enter Google’s Gemini Deep Think (version 2.5), a multimodal powerhouse, blending reasoning, reinforcement learning, text, image, and now, video capabilities.

Latest trick: Gemini Ultra and Pro

Its latest trick, the photo-to-video feature, lets paid users (on Ultra and Pro plans) turn still images into 8-second video clips with sound, using text prompts. Google is giving its Gemini AI a new party trick: photo-to-video magic.

Launched July 10, 2025, on the Gemini web app (with mobile rollout soon after), it’s powered by Google’s Veo 3 model, which also fuels their Flow filmmaking tool.

The Ultra package also includes advances in parallel reasoning & reinforcement learning (RL) to tackle tough math and science problems.

It's being peddled as "better" than Gemini Pro, OpenAI 03 and Grok4 on several metrics (i.e mathematics, coding, reasoning and knowledge).

How it works?

Just upload a picture, add a text description, and boom — MP4 video at 720p in 16:9 format. This tool, powered by Google’s Veo 3 model (dropped at their May developer conference), was first available in a paid filmmaking app, called Flow.

Now it’s built into Gemini’s chat interface, helping Google keep up with big players like OpenAI, Runway AI, and even China’s Alibaba, Manus, and Kuaishou, who’ve all been flexing new video tools.

The catch?

It’s not perfect — facial features can shift in videos, and complex prompts like breakdancing might flop, reports Bloomberg.

$249.99/month

Cost of Gemini Ultra cost

Google says there’s no intentional tweaking of appearances, but the tech’s still new and can be wonky with faces. It’s better at animating objects, drawings, or nature shots, and they’re working on improving it.

Still, it’s a step toward making AI video editing accessible and creative.

Nano Banana

Nano Banana, officially Gemini 2.5 Flash Image, is Google’s top-tier image editing model, integrated into the Gemini app as of August 2025.

Nano Banana has gone viral (or, more aptly, bananas).

Why? Its ability to edit photos with natural language prompts (no Photoshop skills needed) has turned heads.

Think changing outfits, swapping backgrounds, or blending images while keeping faces consistent.

It’s fast (1-2 seconds per edit), reliable, and reportedly tops LMArena’s image-editing leaderboard. You can use it free (100 edits/day) or via paid plans like Gemini API or Vertex AI for enterprise.

It’s also watermark-free on apps like Imogen, unlike Gemini’s branded outputs.

The model is available through the Gemini API for developers or Vertex AI for enterprises.

Benefits for users

Ease of use: Both tools let you edit images or create videos with simple text prompts, no technical expertise required. Want a cat talking or a selfie in a mediaeval sari? Done in seconds.
Creative freedom: Nano Banana’s multi-turn editing remembers past changes, letting you refine images iteratively (e.g., add a bookshelf, then a couch). The photo-to-video tool animates everyday objects or nature shots, sparking endless creative possibilities.
Time and cost savings: Nano Banana cuts editing time from hours to minutes, saving professionals (marketers, influencers, designers) big bucks on photography or software. The video tool’s low-cost potential (e.g., $0.039/image for Gemini 2.5 Flash) undercuts rivals like OpenAI.

3.9 cents

Price for output images up to 1024 x 1024px using Nano Banana/Gemini 2.5 Flash Image (equivalent to $30 per 1 million output "tokens").

What's next?

It's a good question. The jury is still out on where these powerful AI Video editing and tweaking will take all of us.

But, like the arrival of petrol engine cars or digital cameras, AI video and image editing is set to shake things up.

A few thoughts on how it could change things:

Democratised creativity

Just as cars democratised mobility, tools like Gemini and Nano Banana make pro-level editing accessible to anyone with a phone, levelling the playing field for small creators, businesses, or hobbyists. Expect a flood of personalised ads, social media content, and indie films. + Hollywood eat your heart out!

Industry disruption:

E-commerce and marketing are already using Nano Banana to generate product visuals or campaign mock-ups, slashing costs. Video tools could streamline film pre-production or create hyper-realistic virtual sets, challenging traditional workflows.

Not-so-exciting ethical questions

As AI gets better at mimicking reality, risks like deepfakes or misinformation grow.

Google’s got guardrails: no videos of celebs, presidents, or well-known CEOs, and nothing promoting violence or bullying.

SynthID watermarks also aim to curb misuse, but the tech’s accessibility could spark debates over authenticity and regulation.

The AI race is wild, with Google, OpenAI, and others pushing boundaries. Gemini and Nano Banana signal a future where anyone can be a creator, but the world will need to grapple with the blurry line between real and fake.

Gemini AI vs Others

Feature/Aspect	Gemini AI (Google)	ChatGPT (OpenAI)	Perplexity	DeepSeek AI	Claude (Anthropic)	Grok (xAI)
Primary Focus	Multimodal tasks, productivity, Google ecosystem integration	General-purpose conversation, creative content, coding	Real-time fact retrieval, academic research	Code generation, technical tasks, cost-effective NLP	Safe, reliable responses, long-form reasoning	Real-time answers, witty tone, X platform integration
Multimodal Capabilities	Strong: Text, images, video (Veo 3), audio; excels in Google Docs, Gmail integration	Strong: Text, images, data analysis; supports image generation and file processing	Moderate: Text, images; video via Veo 3; focused on search-based outputs	Limited: Primarily text, some code generation	Moderate: Text, some image support via plugins; no native video generation	Moderate: Text, images via Aurora; no native video generation
Real-Time Data Access	Yes, via Google Search integration; fast for web-based queries	Yes, with Search and Deep Research; less consistent for recent data	Yes, excels in real-time web search with citations	Limited: Relies on training data, no native real-time search	Limited: Supports web searches via plugins, not as seamless	Yes, strong real-time access via X platform integration
Research Capabilities	Deep Research mode for multi-step tasks; less accurate than Perplexity (6.2% on Humanity’s Last Exam)	Deep Research mode; high accuracy (26.6% on Humanity’s Last Exam)	Deep Research mode; fast (2-4 min), accurate (21.1% on Humanity’s Last Exam)	Strong for technical research (e.g., STEM); no dedicated Deep Research mode	Strong for long-form analysis, but no dedicated real-time research mode	Real-time research with citations; less accurate (3.8% on Humanity’s Last Exam)
Coding Performance	Good for general coding; less polished than ChatGPT	Excellent: Polished, user-friendly code (e.g., password generators with UI flair)	Good for quick code explanations, debugging; less advanced than ChatGPT	Excellent: Specialized in code generation, debugging, STEM tasks	Strong: Excels in coding with clear explanations	Moderate: Functional but less refined than ChatGPT or DeepSeek
Creativity	High: Excels in creative writing, multimedia content generation	Very High: Best for storytelling, imaginative content	Moderate: Less suited for creative tasks, prioritizes factual accuracy	Low: Focused on technical tasks, not creative writing	High: Strong for thoughtful writing, captures user style well	Moderate: Witty but less creative than ChatGPT or Gemini
Accuracy	Moderate: May misinterpret sources or provide outdated links	Moderate: Requires fact-checking; struggles with context retention	High: Excels in factual accuracy with extensive citations	High: Accurate for technical domains, less so for general knowledge	High: Prioritizes safe, reliable responses	Moderate: Potential bias, less accurate for complex research
Cost/Accessibility	Free tier; Deep Research requires Gemini Advanced ($20/month)	Free tier; Deep Research requires Pro ($200/month)	Free tier (5 queries/day); Pro ($20/month) offers 500 Deep Research queries	Cost-effective; open-source options available	Subscription-based; pricing varies by model (Haiku, Sonnet, Opus)	Free tier; integrated with X, no clear pricing for advanced features
Unique Edge	Seamless Google ecosystem integration (Docs, Gmail, Cloud); strong video generation	Versatile all-rounder; memory feature for personalized interactions	Fast, citation-rich research; accessible pricing for Deep Research	Cost-effective, open-source; excels in STEM and Chinese NLP	Safety-focused; excellent for long-form reasoning and analysis	Real-time data via X; unique, humorous tone
Weaknesses	Less accurate for research; subscription barrier for advanced features	Inconsistent real-time data; high cost for Deep Research	Limited creativity; reliant on web sources streaming:	Limited multimodal capabilities; niche focus on technical tasks	Limited real-time search; less versatile for casual tasks	Less mature, still catching up to competitors; potential bias
Best Use Cases	Productivity, multimodal apps, Google Cloud users	Creative writing, coding, general-purpose tasks	Academic research, fact-checking, quick data retrieval	Technical research, coding, cost-sensitive applications	Complex reasoning, safe content generation	Real-time Q&A, casual conversations, X-based research

Get the latest news and updates straight to your inbox

Up Next