OpenAI upgrades its transcription and voice-generating AI models

Topics

Latest

Amazon

Image Credits:Bryce Durbin / TechCrunch

Apps

Biotech & Health

Climate

OpenAI transcription results

The results from OpenAI transcription benchmarking.Image Credits:OpenAI

Cloud Computing

commercialism

Crypto

initiative

EVs

Fintech

Fundraising

Gadgets

bet on

Google

Government & Policy

Hardware

Instagram

Layoffs

Media & Entertainment

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

Podcasts

video

Partner Content

TechCrunch Brand Studio

Crunchboard

OpenAI is take new recording and voice - generating AI model to its API that the ship’s company claims ameliorate upon its former releases .

For OpenAI , the models fit into its broader “ agentic ” vision : building automated organization that can severally accomplish task on behalf of user . The definition of “ agentive role ” might be in difference of opinion , but OpenAI Head of Product Olivier Godement describe one interpretation as a chatbot that can speak with a business ’s customers .

“ We ’re going to see more and more agent pop up in the come month ” Godement severalise TechCrunch during a briefing . “ And so the general theme is helping customers and developers leverage agent that are useful , available , and accurate . ”

OpenAI claim that its Modern textbook - to - speech model , “ gpt-4o - miniskirt - TDT , ” not only give birth more nuanced and realistic - sounding actor’s line but is also more “ steerable ” than its previous - gen speech - synthesizing model . developer can apprise gpt-4o - mini - tts on how to say things in born spoken communication — for exercise , “ speak like a mad scientist ” or “ apply a calm voice , like a heedfulness teacher . ”

Here ’s a “ honest criminal offense - flair , ” weathered voice :

And here ’s a sampling of a female “ professional ” voice :

Jeff Harris , a member of the product staff at OpenAI , told TechCrunch that the destination is to let developer cut both the voice “ experience ” and “ context of use . ”

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

“ In unlike contexts , you do n’t just want a flat , monotonous spokesperson , ” Harris said . “ If you ’re in a customer support experience and you want the articulation to be apologetic because it ’s made a error , you could actually have the voice have that emotion in it … Our bighearted impression , here , is that developer and users want to really hold not just what is spoken , but how things are spoken . ”

As for OpenAI ’s new speech - to - text models , “ gpt-4o - transcribe ” and “ gpt-4o - mini - transcribe , ” they effectively replace the ship’s company ’s long - in - the - toothWhisper written text model . Trained on “ diverse , gamey - character audio datasets , ” the new model can intimately capture tonic and varied delivery , OpenAI claims , even in chaotic surround .

They ’re also less potential to hallucinate , Harris supply . Whisper notoriously tended to invent words — and even whole passages — in conversations , introducing everything from racial comment to imagined medical treatment into transcript .

“ [ T]hese models are much amend versus Whisper on that front , ” Harris said . “ Making sure the models are accurate is completely all-important to getting a dependable voice experience , and precise [ in this context ] means that the fashion model are hearing the Scripture exactly [ and ] are n’t fill in item that they did n’t find out . ”

Your mileage may vary count on the speech communication being transcribed , however .

According to OpenAI ’s internal benchmarks , gpt-4o - transcribe , the more precise of the two arranging models , has a “ tidings error rate ” border on 30 % ( out of 120 % ) for Indic and Dravidian languages such as Tamil , Telugu , Malayalam , and Kannada . That think three out of every 10 words from the model will differ from a human transcription in those speech communication .

In a break from tradition , OpenAI does n’t project to make its new transcription models openly available . The companyhistorically released new versions of Whisperfor commercial employment under an MIT permission .

Harris say that gpt-4o - transcribe and gpt-4o - mini - transcribe are “ much bigger than Whisper ” and thus not right candidates for an overt release .

“ [ T]hey’re not the kind of model that you may just run locally on your laptop , like Whisper , ” he extend . “ [ W]e want to make trusted that if we ’re unloose things in opened source , we ’re doing it thoughtfully , and we have a model that ’s really honed for that specific motivation . And we think that remnant - user equipment are one of the most interesting cases for open - informant model . ”

Updated March 20 , 2025 , 11:54 a.m. PT to clarify the languagearound word fault rate and updated the bench mark effect chart with a more late translation .

Topics#

More from TechCrunch#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

Topics

More from TechCrunch

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI