Gemini 2.0, Google’s newest flagship AI, can generate text, images, and speech

Topics

Latest

Amazon

Image Credits:Maxwell Zeff

Apps

Biotech & Health

mood

Gemini stage presentation at Made by Google 24

Image Credits:Maxwell Zeff

Cloud Computing

Commerce

Crypto

endeavour

EVs

Fintech

fundraise

Gadgets

Gaming

Google

Government & Policy

Hardware

Instagram

layoff

Media & Entertainment

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

newssheet

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

Google ’s next major AI model has get to combat aslewofnewofferings from OpenAI .

On Wednesday , Google announcedGemini2.0 Flash , which the ship’s company say can natively generate figure of speech and sound in summation to text . 2.0 Flash can also use third - party apps and serving , allowing it to tap into Google Search , execute code , and more .

An data-based release of 2.0 Flash will be available through the Gemini API and Google ’s AI developer platform , AI StudioandVertex AI , starting today . However , the sound and image generation capabilities are launch only for “ former admission married person ” ahead of a encompassing rollout in January .

In the coming month , Google says that it ’ll impart 2.0 Flash in a range of smack to product likeAndroid Studio , Chrome DevTools , Firebase , Gemini Code Assist , and others .

Flash, upgraded

The first - gen Flash,1.5 Flash , could generate only text edition , and was n’t designed for peculiarly demanding workloads . This new model is more versatile , Google order , in part because it can call tools like Search and interact with external APIs .

“ We cognize Flash is extremely democratic with developers for its … balance of speed and carrying into action , ” Tulsee Doshi , head of product for Gemini model at Google , said during a briefing Tuesday . “ And with 2.0 newsflash , it ’s just as fast as ever , but now it ’s even more muscular . ”

Google claims that 2.0 Flash , which is twice as fast as the company’sGemini 1.5 Promodel on sealed benchmarks , per Google ’s own testing , is “ significantly ” meliorate in domain like coding and image analytic thinking . In fact , the company say , 2.0 Flash displaces 1.5 Pro as the flagship Gemini model , thanks to its superior math skills and “ factualness . ”

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

As alluded to earlier , 2.0 Flash can generate — and change — images alongside text . The model can also take in photos and videos , as well as audio recordings , to do question about them ( e.g. “ What did he say ? ” ) .

Audio generation is 2.0 Flash ’s other key feature article , and Doshi describe it as “ dirigible ” and “ customizable . ” For good example , the model can narrate text using one of eight vocalization “ optimized ” for different accent and languages .

“ you could necessitate it to talk slower , you could ask it to talk faster , or you could even ask it to say something like a sea robber , ” she added .

Now , I ’m duty - bound as a diary keeper to note that Google did n’t provide trope or audio samples from 2.0 twinkling . We have no way of get it on how the quality compares to output from other manakin , at least as of the time of writing .

Google says it ’s using itsSynthIDtechnology to watermark all audio and images generated by 2.0 Flash . Onsoftware and platformsthat plunk for SynthID — that is , select Google product — the model ’s outputs will be flagged as synthetic .

That ’s to allay fears of abuse . Indeed , deepfakes are a produce threat . Accordingto ID check service Sumsub , there was a 4x step-up in deepfakes notice worldwide from 2023 to 2024 .

Multimodal API

The product version of 2.0 Flash will land in January . But in the meantime , Google is free an API , the Multimodal bouncy API , to help developers build apps with real - clip audio and video streaming functionality .

Using the Multimodal alive API , Google tell , developers can make real - sentence , multimodal apps with audio and video inputs from cameras or screen . The API supports the integration of tools to accomplish tasks , and it can treat “ natural conversation patterns ” such as interruptions — along the parentage of OpenAI’sRealtime API .

The Multimodal Live API is broadly speaking uncommitted as of this morning .

Topics#

More from TechCrunch#

Flash, upgraded#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

Multimodal API#