Google GeminiAI and Nano Banana: What you need to know

Let your creativity shine: Is $249/month Google Gemini Ultra text-to-video tool worth it?

Last updated:
Jay Hilotin, Senior Assistant Editor
4 MIN READ
Gemini Nano Banana: Tiny tech, mighty flavour.
Gemini Nano Banana: Tiny tech, mighty flavour.
X | @FutureStacked

The AI race is wild, with no rulebook. In this world-without-rules, anyone can be a creator.

That's the exciting part. There's a not-so-exciting part (read till the end).

Enter Google’s Gemini Deep Think (version 2.5), a multimodal powerhouse, blending reasoning, reinforcement learning, text, image, and now, video capabilities.

Latest trick: Gemini Ultra and Pro

Its latest trick, the photo-to-video feature, lets paid users (on Ultra and Pro plans) turn still images into 8-second video clips with sound, using text prompts. Google is giving its Gemini AI a new party trick: photo-to-video magic.

Launched July 10, 2025, on the Gemini web app (with mobile rollout soon after), it’s powered by Google’s Veo 3 model, which also fuels their Flow filmmaking tool.

The Ultra package also includes advances in parallel reasoning & reinforcement learning (RL) to tackle tough math and science problems.

It's being peddled as "better" than Gemini Pro, OpenAI 03 and Grok4 on several metrics (i.e mathematics, coding, reasoning and knowledge).

How it works?

Just upload a picture, add a text description, and boom — MP4 video at 720p in 16:9 format. This tool, powered by Google’s Veo 3 model (dropped at their May developer conference), was first available in a paid filmmaking app, called Flow.

Now it’s built into Gemini’s chat interface, helping Google keep up with big players like OpenAI, Runway AI, and even China’s Alibaba, Manus, and Kuaishou, who’ve all been flexing new video tools.

The catch?

It’s not perfect — facial features can shift in videos, and complex prompts like breakdancing might flop, reports Bloomberg.

$249.99/month
Cost of Gemini Ultra cost

Google says there’s no intentional tweaking of appearances, but the tech’s still new and can be wonky with faces. It’s better at animating objects, drawings, or nature shots, and they’re working on improving it.

Still, it’s a step toward making AI video editing accessible and creative.

Nano Banana

Nano Banana, officially Gemini 2.5 Flash Image, is Google’s top-tier image editing model, integrated into the Gemini app as of August 2025.

Nano Banana has gone viral (or, more aptly, bananas).

Why? Its ability to edit photos with natural language prompts (no Photoshop skills needed) has turned heads.

Think changing outfits, swapping backgrounds, or blending images while keeping faces consistent.

It’s fast (1-2 seconds per edit), reliable, and reportedly tops LMArena’s image-editing leaderboard. You can use it free (100 edits/day) or via paid plans like Gemini API or Vertex AI for enterprise.

It’s also watermark-free on apps like Imogen, unlike Gemini’s branded outputs.

The model is available through the Gemini API for developers or Vertex AI for enterprises.

Benefits for users

  • Ease of use: Both tools let you edit images or create videos with simple text prompts, no technical expertise required. Want a cat talking or a selfie in a mediaeval sari? Done in seconds.

  • Creative freedom: Nano Banana’s multi-turn editing remembers past changes, letting you refine images iteratively (e.g., add a bookshelf, then a couch). The photo-to-video tool animates everyday objects or nature shots, sparking endless creative possibilities.

  • Time and cost savings: Nano Banana cuts editing time from hours to minutes, saving professionals (marketers, influencers, designers) big bucks on photography or software. The video tool’s low-cost potential (e.g., $0.039/image for Gemini 2.5 Flash) undercuts rivals like OpenAI.

3.9 cents
Price for output images up to 1024 x 1024px using Nano Banana/Gemini 2.5 Flash Image (equivalent to $30 per 1 million output "tokens").

What's next?

It's a good question. The jury is still out on where these powerful AI Video editing and tweaking will take all of us.

But, like the arrival of petrol engine cars or digital cameras, AI video and image editing is set to shake things up.

A few thoughts on how it could change things:

  • Democratised creativity

    Just as cars democratised mobility, tools like Gemini and Nano Banana make pro-level editing accessible to anyone with a phone, levelling the playing field for small creators, businesses, or hobbyists. Expect a flood of personalised ads, social media content, and indie films. + Hollywood eat your heart out!

  • Industry disruption:

    E-commerce and marketing are already using Nano Banana to generate product visuals or campaign mock-ups, slashing costs. Video tools could streamline film pre-production or create hyper-realistic virtual sets, challenging traditional workflows.

Not-so-exciting ethical questions

As AI gets better at mimicking reality, risks like deepfakes or misinformation grow.

Google’s got guardrails: no videos of celebs, presidents, or well-known CEOs, and nothing promoting violence or bullying.

SynthID watermarks also aim to curb misuse, but the tech’s accessibility could spark debates over authenticity and regulation.

The AI race is wild, with Google, OpenAI, and others pushing boundaries. Gemini and Nano Banana signal a future where anyone can be a creator, but the world will need to grapple with the blurry line between real and fake.

Gemini AI vs Others

Feature/AspectGemini AI (Google)ChatGPT (OpenAI)PerplexityDeepSeek AIClaude (Anthropic)Grok (xAI)
Primary FocusMultimodal tasks, productivity, Google ecosystem integrationGeneral-purpose conversation, creative content, codingReal-time fact retrieval, academic researchCode generation, technical tasks, cost-effective NLPSafe, reliable responses, long-form reasoningReal-time answers, witty tone, X platform integration
Multimodal CapabilitiesStrong: Text, images, video (Veo 3), audio; excels in Google Docs, Gmail integrationStrong: Text, images, data analysis; supports image generation and file processingModerate: Text, images; video via Veo 3; focused on search-based outputsLimited: Primarily text, some code generationModerate: Text, some image support via plugins; no native video generationModerate: Text, images via Aurora; no native video generation
Real-Time Data AccessYes, via Google Search integration; fast for web-based queriesYes, with Search and Deep Research; less consistent for recent dataYes, excels in real-time web search with citationsLimited: Relies on training data, no native real-time searchLimited: Supports web searches via plugins, not as seamlessYes, strong real-time access via X platform integration
Research CapabilitiesDeep Research mode for multi-step tasks; less accurate than Perplexity (6.2% on Humanity’s Last Exam)Deep Research mode; high accuracy (26.6% on Humanity’s Last Exam)Deep Research mode; fast (2-4 min), accurate (21.1% on Humanity’s Last Exam)Strong for technical research (e.g., STEM); no dedicated Deep Research modeStrong for long-form analysis, but no dedicated real-time research modeReal-time research with citations; less accurate (3.8% on Humanity’s Last Exam)
Coding PerformanceGood for general coding; less polished than ChatGPTExcellent: Polished, user-friendly code (e.g., password generators with UI flair)Good for quick code explanations, debugging; less advanced than ChatGPTExcellent: Specialized in code generation, debugging, STEM tasksStrong: Excels in coding with clear explanationsModerate: Functional but less refined than ChatGPT or DeepSeek
CreativityHigh: Excels in creative writing, multimedia content generationVery High: Best for storytelling, imaginative contentModerate: Less suited for creative tasks, prioritizes factual accuracyLow: Focused on technical tasks, not creative writingHigh: Strong for thoughtful writing, captures user style wellModerate: Witty but less creative than ChatGPT or Gemini
AccuracyModerate: May misinterpret sources or provide outdated linksModerate: Requires fact-checking; struggles with context retentionHigh: Excels in factual accuracy with extensive citationsHigh: Accurate for technical domains, less so for general knowledgeHigh: Prioritizes safe, reliable responsesModerate: Potential bias, less accurate for complex research
Cost/AccessibilityFree tier; Deep Research requires Gemini Advanced ($20/month)Free tier; Deep Research requires Pro ($200/month)Free tier (5 queries/day); Pro ($20/month) offers 500 Deep Research queriesCost-effective; open-source options availableSubscription-based; pricing varies by model (Haiku, Sonnet, Opus)Free tier; integrated with X, no clear pricing for advanced features
Unique EdgeSeamless Google ecosystem integration (Docs, Gmail, Cloud); strong video generationVersatile all-rounder; memory feature for personalized interactionsFast, citation-rich research; accessible pricing for Deep ResearchCost-effective, open-source; excels in STEM and Chinese NLPSafety-focused; excellent for long-form reasoning and analysisReal-time data via X; unique, humorous tone
WeaknessesLess accurate for research; subscription barrier for advanced featuresInconsistent real-time data; high cost for Deep ResearchLimited creativity; reliant on web sources streaming:Limited multimodal capabilities; niche focus on technical tasksLimited real-time search; less versatile for casual tasksLess mature, still catching up to competitors; potential bias
Best Use CasesProductivity, multimodal apps, Google Cloud usersCreative writing, coding, general-purpose tasksAcademic research, fact-checking, quick data retrievalTechnical research, coding, cost-sensitive applicationsComplex reasoning, safe content generationReal-time Q&A, casual conversations, X-based research

Sign up for the Daily Briefing

Get the latest news and updates straight to your inbox

Up Next