Topics
a la mode
AI
Amazon
Image Credits:Michael Nagle/Bloomberg / Getty Images
Apps
Biotech & Health
mood
Image Credits:Michael Nagle/Bloomberg / Getty Images
Cloud Computing
Commerce
Crypto
Enterprise
EVs
Fintech
fund-raise
Gadgets
Gaming
Government & Policy
ironware
Layoffs
Media & Entertainment
Meta
Microsoft
privateness
Robotics
certificate
Social
blank
Startups
TikTok
transfer
speculation
More from TechCrunch
Events
Startup Battlefield
StrictlyVC
Podcasts
Videos
Partner Content
TechCrunch Brand Studio
Crunchboard
Contact Us
On Tuesday , Amazon debut a novel productive AI model , Nova Sonic , subject of natively process phonation and generating natural - sounding speech . Amazon take that Sonic ’s performance is competitive with frontier spokesperson example from OpenAI and Google on bench mark measure out speed , talking to recognition , and conversational caliber .
Nova Sonic is Amazon ’s answer to newer AI vox models such as the model poweringChatGPT ’s Voice Mode , which experience more raw to utter with than the more strict models from Amazon Alexa ’s former days . Recent technical find have made bequest models and the digital assistants they underpin , such as Alexa and Apple ’s Siri , seem incredibly stilted by comparing .
Nova Sonic is available through Bedrock , Amazon ’s developer platform for building enterprise AI app , via a fresh bi - directional stream API . In a press release , Amazon call off Nova Sonic “ the most cost - efficient ” AI voice manakin on the mart , and around 80 % less expensive than OpenAI ’s GPT-4o .
Components of Nova Sonic are already poweringAlexa+ , Amazon ’s upgraded digital part assistant , according to Amazon SVP and Head Scientist of AGI Rohit Prasad .
In an consultation , Prasad secern TechCrunch that Nova Sonic construct on Amazon ’s expertise in “ large instrumentation organization , ” the proficient staging that fix up Alexa . Compared to rival AI voice models , Nova Sonic excels at routing user requests to different APIs , say Prasad . This capability helps Nova Sonic “ have sex ” when it require to fetch real - sentence information from the net , parse a proprietary datum generator , or take action mechanism in an external app — and utilize the appropriate prick to do it .
During a two - way dialog , Nova Sonic waits to talk “ at the appropriate time , ” taking into account a loudspeaker ’s break and interruptions , say Amazon . It also father a text copy for the user ’s speech , which developer can employ for various app .
Nova Sonic is less prone to speech acknowledgment errors than other AI vocalism models , according to Prasad , meaning the model is comparatively good at realize a user ’s intent even if they mumble , misspeak , or are in a noisy setting . On a benchmark measuring delivery acknowledgement across linguistic communication and dialects , Multilingual LibriSpeech , Amazon pronounce Nova Sonic achieved a word mistake rate ( WER ) of just 4.2 % when averaged across English , French , Italian , German , and Spanish . That signify that roughly four out of every 100 intelligence from the manakin differed from a human arrangement in those languages .
Join us at TechCrunch Sessions: AI
Exhibit at TechCrunch Sessions: AI
On another bench mark mensurate loud interactions with multiple participants , Augmented Multi Party Interaction , Amazon read Nova Sonic was 46.7 % more accurate in terms of WER thanOpenAI ’s GPT-4o - transcribemodel . Nova Sonic also has industry - lead speed , with an average perceived latent period of 1.09 sec , allot to Amazon . That makes it faster than the GPT-4o model powering OpenAI ’s Realtime API , which responds in 1.18 seconds , per benchmarking by Artificial Analysis .
Prasad say Nova Sonic is a part of Amazon ’s all-inclusive strategy to build AGI ( artificial general intelligence ) , which the company defines as “ AI organization that can do anything a man can do on a information processing system . ” Moving forward , Prasad say Amazon plans to put out more AI model that can understand different modality , including image , video , and representative , as well as “ other sensory data that are relevant if you bring things into the physical world . ”
Amazon ’s AGI division , which Prasad oversees , seems to be playing a larger role in the company ’s product strategy these days . Just last calendar week , Amazonlaunched a preview of Nova Act , a web browser - using AI model that seems to be powering elements of Alexa+ andAmazon ’s Buy for Me feature . Starting with Nova Sonic , Prasad says the company need to offer more of its internal AI models for developer to build with .