Topics

Latest

AI

Amazon

Article image

Image Credits:Bryce Durbin / TechCrunch

Apps

Biotech & Health

Climate

OpenAI transcription results

The results from OpenAI transcription benchmarking.Image Credits:OpenAI

Cloud Computing

commercialism

Crypto

initiative

EVs

Fintech

Fundraising

Gadgets

bet on

Google

Government & Policy

Hardware

Instagram

Layoffs

Media & Entertainment

Meta

Microsoft

Privacy

Robotics

certificate

Social

Space

inauguration

TikTok

exile

Venture

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

Podcasts

video

Partner Content

TechCrunch Brand Studio

Crunchboard

Contact Us

OpenAI is take new recording and voice - generating AI model to its API that the ship’s company claims ameliorate upon its former releases .

For OpenAI , the models fit into its broader “ agentic ” vision : building automated organization that can severally accomplish task on behalf of user . The definition of “ agentive role ” might be in difference of opinion , but OpenAI Head of Product Olivier Godement describe one interpretation as a chatbot that can speak with a business ’s customers .

“ We ’re going to see more and more agent pop up in the come month ” Godement severalise TechCrunch during a briefing . “ And so the general theme is helping customers and developers leverage agent that are useful , available , and accurate . ”

OpenAI claim that its Modern textbook - to - speech model , “ gpt-4o - miniskirt - TDT , ” not only give birth more nuanced and realistic - sounding actor’s line but is also more “ steerable ” than its previous - gen speech - synthesizing model . developer can apprise gpt-4o - mini - tts on how to say things in born spoken communication — for exercise , “ speak like a mad scientist ” or “ apply a calm voice , like a heedfulness teacher . ”

Here ’s a “ honest criminal offense - flair , ” weathered voice :

And here ’s a sampling of a female “ professional ” voice :

Jeff Harris , a member of the product staff at OpenAI , told TechCrunch that the destination is to let developer cut both the voice “ experience ” and “ context of use . ”

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

“ In unlike contexts , you do n’t just want a flat , monotonous spokesperson , ” Harris said . “ If you ’re in a customer support experience and you want the articulation to be apologetic because it ’s made a error , you could actually have the voice have that emotion in it   … Our bighearted impression , here , is that developer and users want to really hold not just what is spoken , but how things are spoken . ”

As for OpenAI ’s new speech - to - text models , “ gpt-4o - transcribe ” and “ gpt-4o - mini - transcribe , ” they effectively replace the ship’s company ’s long - in - the - toothWhisper written text model . Trained on “ diverse , gamey - character audio datasets , ” the new model can intimately capture tonic and varied delivery , OpenAI claims , even in chaotic surround .

They ’re also less potential to hallucinate , Harris supply . Whisper notoriously tended to invent words — and even whole passages — in conversations , introducing everything from racial comment to imagined medical treatment into transcript .

“ [ T]hese models are much amend versus Whisper on that front , ” Harris said . “ Making sure the models are accurate is completely all-important to getting a dependable voice experience , and precise [ in this context ] means that the fashion model are hearing the Scripture exactly [ and ] are n’t fill in item that they did n’t find out . ”

Your mileage may vary count on the speech communication being transcribed , however .

According to   OpenAI ’s internal benchmarks , gpt-4o - transcribe , the more precise of the two arranging models , has a “ tidings error rate ” border on   30 % ( out of 120 % )   for Indic and Dravidian languages such as Tamil , Telugu , Malayalam , and Kannada . That think   three out of every 10 words from the model will differ from a human transcription   in those speech communication .

In a break from tradition , OpenAI does n’t project to make its new transcription models openly available . The companyhistorically released new versions of Whisperfor commercial employment under an MIT permission .

Harris say that gpt-4o - transcribe and gpt-4o - mini - transcribe are “ much bigger than Whisper ” and thus not right candidates for an overt release .

“ [ T]hey’re not the kind of model that you may just run locally on your laptop , like Whisper , ” he extend . “ [ W]e want to make trusted that if we ’re unloose things in opened source , we ’re doing it thoughtfully , and we have a model that ’s really honed for that specific motivation . And we think that remnant - user equipment are one of the most interesting cases for open - informant model . ”

Updated March 20 , 2025 , 11:54 a.m. PT to clarify the languagearound word fault rate and updated the bench mark effect chart with a more late translation .