Topics

a la mode

AI

Amazon

Article image

Image Credits:Michael Nagle/Bloomberg / Getty Images

Apps

Biotech & Health

mood

Andy Jassy, chief executive officer of Amazon.com Inc.

Image Credits:Michael Nagle/Bloomberg / Getty Images

Cloud Computing

Commerce

Crypto

Enterprise

EVs

Fintech

fund-raise

Gadgets

Gaming

Google

Government & Policy

ironware

Instagram

Layoffs

Media & Entertainment

Meta

Microsoft

privateness

Robotics

certificate

Social

blank

Startups

TikTok

transfer

speculation

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

Contact Us

On Tuesday , Amazon debut a novel productive AI model , Nova Sonic , subject of natively process phonation and generating natural - sounding speech . Amazon take that Sonic ’s performance is competitive with frontier spokesperson example from OpenAI and Google on bench mark measure out speed , talking to recognition , and conversational caliber .

Nova Sonic is Amazon ’s answer to newer AI vox models such as the model poweringChatGPT ’s Voice Mode , which experience more raw to utter with than the more strict models from Amazon Alexa ’s former days . Recent technical find have made bequest models and the digital assistants they underpin , such as Alexa and Apple ’s Siri , seem incredibly stilted by comparing .

Nova Sonic is available through Bedrock , Amazon ’s developer platform for building enterprise AI app , via a fresh bi - directional stream API . In a press release , Amazon call off Nova Sonic “ the most cost - efficient ” AI voice manakin on the mart , and around 80 % less expensive than OpenAI ’s GPT-4o .

Components of Nova Sonic are already poweringAlexa+ , Amazon ’s upgraded digital part assistant , according to Amazon SVP and Head Scientist of AGI Rohit Prasad .

In an consultation , Prasad secern TechCrunch that Nova Sonic construct on Amazon ’s expertise in “ large instrumentation organization , ” the proficient staging that fix up Alexa . Compared to rival AI voice models , Nova Sonic excels at routing user requests to different APIs , say Prasad . This capability helps Nova Sonic “ have sex ” when it require to fetch real - sentence information from the net , parse a proprietary datum generator , or take action mechanism in an external app — and utilize the appropriate prick to do it .

During a two - way dialog , Nova Sonic waits to talk “ at the appropriate time , ” taking into account a loudspeaker ’s break and interruptions , say Amazon . It also father a text copy for the user ’s speech , which developer can employ for various app .

Nova Sonic is less prone to speech acknowledgment errors than other AI vocalism models , according to Prasad , meaning the model is comparatively good at realize a user ’s intent even if they mumble , misspeak , or are in a noisy setting . On a benchmark measuring delivery acknowledgement across linguistic communication and dialects , Multilingual LibriSpeech , Amazon pronounce Nova Sonic achieved a word mistake rate ( WER ) of just 4.2 % when averaged across English , French , Italian , German , and Spanish . That signify   that roughly four out of every 100 intelligence from the manakin differed from a human arrangement   in those languages .

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

On another bench mark mensurate loud interactions with multiple participants , Augmented Multi Party Interaction , Amazon read Nova Sonic was 46.7 % more accurate in terms of WER thanOpenAI ’s GPT-4o - transcribemodel . Nova Sonic also has industry - lead speed , with an average perceived latent period of 1.09 sec , allot to Amazon . That makes it faster than the GPT-4o model powering OpenAI ’s Realtime API , which responds in 1.18 seconds , per benchmarking by Artificial Analysis .

Prasad say Nova Sonic is a part of Amazon ’s all-inclusive strategy to build AGI ( artificial general intelligence ) , which the company defines as “ AI organization that can do anything a man can do on a information processing system . ” Moving forward , Prasad say Amazon plans to put out more AI model that can understand different modality , including image , video , and representative , as well as “ other sensory data that are relevant if you bring things into the physical world . ”

Amazon ’s AGI division , which Prasad oversees , seems to be playing a larger role in the company ’s product strategy these days . Just last calendar week , Amazonlaunched a preview of Nova Act , a web browser - using AI model that seems to be powering elements of Alexa+ andAmazon ’s Buy for Me feature . Starting with Nova Sonic , Prasad says the company need to offer more of its internal AI models for developer to build with .