Topics

Latest

AI

Amazon

Article image

Image Credits:Maxwell Zeff

Apps

Biotech & Health

mood

Gemini stage presentation at Made by Google 24

Image Credits:Maxwell Zeff

Cloud Computing

Commerce

Crypto

endeavour

EVs

Fintech

fundraise

Gadgets

Gaming

Google

Government & Policy

Hardware

Instagram

layoff

Media & Entertainment

Meta

Microsoft

Privacy

Robotics

Security

Social

Space

Startups

TikTok

transit

Venture

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

newssheet

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

Contact Us

Google ’s next major AI model has get to combat aslewofnewofferings from OpenAI .

On Wednesday , Google announcedGemini2.0 Flash , which the ship’s company say can natively generate figure of speech and sound in summation to text . 2.0 Flash can also use third - party apps and serving , allowing it to tap into Google Search , execute code , and more .

An data-based release of 2.0 Flash will be available through the Gemini API and Google ’s AI developer platform , AI StudioandVertex AI , starting today . However , the sound and image generation capabilities are launch only for “ former admission married person ” ahead of a encompassing rollout in January .

In the coming month , Google says that it ’ll impart 2.0 Flash in a range of smack to product likeAndroid Studio , Chrome DevTools , Firebase , Gemini Code Assist , and others .

Flash, upgraded

The first - gen Flash,1.5 Flash , could generate only text edition , and was n’t designed for peculiarly demanding workloads . This new model is more versatile , Google order , in part because it can call tools like Search and interact with external APIs .

“ We cognize Flash is extremely democratic with developers for its … balance of speed and carrying into action , ” Tulsee Doshi , head of product for Gemini model at Google , said during a briefing Tuesday . “ And with 2.0 newsflash , it ’s just as fast as ever , but now it ’s even more muscular . ”

Google claims that 2.0 Flash , which is twice as fast as the company’sGemini 1.5 Promodel on sealed benchmarks , per Google ’s own testing , is “ significantly ” meliorate in domain like coding and image analytic thinking . In fact , the company say , 2.0 Flash displaces 1.5 Pro as the flagship Gemini model , thanks to its superior math skills and “ factualness . ”

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

As alluded to earlier , 2.0 Flash can generate — and change — images alongside text . The model can also take in photos and videos , as well as audio recordings , to do question about them ( e.g. “ What did he say ? ” ) .

Audio generation is 2.0 Flash ’s other key feature article , and Doshi describe it as “ dirigible ” and “ customizable . ” For good example , the model can narrate text using one of eight vocalization “ optimized ” for different accent and languages .

“ you could necessitate it to talk slower , you could ask it to talk faster , or you could even ask it to say something like a sea robber , ” she added .

Now , I ’m duty - bound as a diary keeper to note that Google did n’t provide trope or audio samples from 2.0 twinkling . We have no way of get it on how the quality compares to output from other manakin , at least as of the time of writing .

Google says it ’s using itsSynthIDtechnology to watermark all audio and images generated by 2.0 Flash . Onsoftware and platformsthat plunk for SynthID — that is , select Google product — the model ’s outputs will be flagged as synthetic .

That ’s to allay fears of abuse . Indeed , deepfakes are a produce threat . Accordingto ID check service Sumsub , there was a 4x step-up in deepfakes notice worldwide from 2023 to 2024 .

Multimodal API

The product version of 2.0 Flash will land in January . But in the meantime , Google is free an API , the Multimodal bouncy API , to help developers build apps with real - clip audio and video streaming functionality .

Using the Multimodal alive API , Google tell , developers can make real - sentence , multimodal apps with audio and video inputs from cameras or screen . The API supports the integration of tools to accomplish tasks , and it can treat “ natural conversation patterns ” such as interruptions — along the parentage of OpenAI’sRealtime API .

The Multimodal Live API is broadly speaking uncommitted as of this morning .