Topics
Latest
AI
Amazon
Image Credits:Maxwell Zeff
Apps
Biotech & Health
mood
Image Credits:Maxwell Zeff
Cloud Computing
Commerce
Crypto
endeavour
EVs
Fintech
fundraise
Gadgets
Gaming
Government & Policy
Hardware
layoff
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
Security
Social
Space
Startups
TikTok
transit
Venture
More from TechCrunch
Events
Startup Battlefield
StrictlyVC
newssheet
Podcasts
Videos
Partner Content
TechCrunch Brand Studio
Crunchboard
Contact Us
Google ’s next major AI model has get to combat aslewofnewofferings from OpenAI .
On Wednesday , Google announcedGemini2.0 Flash , which the ship’s company say can natively generate figure of speech and sound in summation to text . 2.0 Flash can also use third - party apps and serving , allowing it to tap into Google Search , execute code , and more .
An data-based release of 2.0 Flash will be available through the Gemini API and Google ’s AI developer platform , AI StudioandVertex AI , starting today . However , the sound and image generation capabilities are launch only for “ former admission married person ” ahead of a encompassing rollout in January .
In the coming month , Google says that it ’ll impart 2.0 Flash in a range of smack to product likeAndroid Studio , Chrome DevTools , Firebase , Gemini Code Assist , and others .
Flash, upgraded
The first - gen Flash,1.5 Flash , could generate only text edition , and was n’t designed for peculiarly demanding workloads . This new model is more versatile , Google order , in part because it can call tools like Search and interact with external APIs .
“ We cognize Flash is extremely democratic with developers for its … balance of speed and carrying into action , ” Tulsee Doshi , head of product for Gemini model at Google , said during a briefing Tuesday . “ And with 2.0 newsflash , it ’s just as fast as ever , but now it ’s even more muscular . ”
Google claims that 2.0 Flash , which is twice as fast as the company’sGemini 1.5 Promodel on sealed benchmarks , per Google ’s own testing , is “ significantly ” meliorate in domain like coding and image analytic thinking . In fact , the company say , 2.0 Flash displaces 1.5 Pro as the flagship Gemini model , thanks to its superior math skills and “ factualness . ”
Join us at TechCrunch Sessions: AI
Exhibit at TechCrunch Sessions: AI
As alluded to earlier , 2.0 Flash can generate — and change — images alongside text . The model can also take in photos and videos , as well as audio recordings , to do question about them ( e.g. “ What did he say ? ” ) .
Audio generation is 2.0 Flash ’s other key feature article , and Doshi describe it as “ dirigible ” and “ customizable . ” For good example , the model can narrate text using one of eight vocalization “ optimized ” for different accent and languages .
“ you could necessitate it to talk slower , you could ask it to talk faster , or you could even ask it to say something like a sea robber , ” she added .
Now , I ’m duty - bound as a diary keeper to note that Google did n’t provide trope or audio samples from 2.0 twinkling . We have no way of get it on how the quality compares to output from other manakin , at least as of the time of writing .
Google says it ’s using itsSynthIDtechnology to watermark all audio and images generated by 2.0 Flash . Onsoftware and platformsthat plunk for SynthID — that is , select Google product — the model ’s outputs will be flagged as synthetic .
That ’s to allay fears of abuse . Indeed , deepfakes are a produce threat . Accordingto ID check service Sumsub , there was a 4x step-up in deepfakes notice worldwide from 2023 to 2024 .
Multimodal API
The product version of 2.0 Flash will land in January . But in the meantime , Google is free an API , the Multimodal bouncy API , to help developers build apps with real - clip audio and video streaming functionality .
Using the Multimodal alive API , Google tell , developers can make real - sentence , multimodal apps with audio and video inputs from cameras or screen . The API supports the integration of tools to accomplish tasks , and it can treat “ natural conversation patterns ” such as interruptions — along the parentage of OpenAI’sRealtime API .
The Multimodal Live API is broadly speaking uncommitted as of this morning .