Topics
Latest
AI
Amazon
Image Credits:Bryce Durbin / TechCrunch
Apps
Biotech & Health
Climate
The results from OpenAI transcription benchmarking.Image Credits:OpenAI
Cloud Computing
commercialism
Crypto
initiative
EVs
Fintech
Fundraising
Gadgets
bet on
Government & Policy
Hardware
Layoffs
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
certificate
Social
Space
inauguration
TikTok
exile
Venture
More from TechCrunch
Events
Startup Battlefield
StrictlyVC
Podcasts
video
Partner Content
TechCrunch Brand Studio
Crunchboard
Contact Us
OpenAI is take new recording and voice - generating AI model to its API that the ship’s company claims ameliorate upon its former releases .
For OpenAI , the models fit into its broader “ agentic ” vision : building automated organization that can severally accomplish task on behalf of user . The definition of “ agentive role ” might be in difference of opinion , but OpenAI Head of Product Olivier Godement describe one interpretation as a chatbot that can speak with a business ’s customers .
“ We ’re going to see more and more agent pop up in the come month ” Godement severalise TechCrunch during a briefing . “ And so the general theme is helping customers and developers leverage agent that are useful , available , and accurate . ”
OpenAI claim that its Modern textbook - to - speech model , “ gpt-4o - miniskirt - TDT , ” not only give birth more nuanced and realistic - sounding actor’s line but is also more “ steerable ” than its previous - gen speech - synthesizing model . developer can apprise gpt-4o - mini - tts on how to say things in born spoken communication — for exercise , “ speak like a mad scientist ” or “ apply a calm voice , like a heedfulness teacher . ”
Here ’s a “ honest criminal offense - flair , ” weathered voice :
And here ’s a sampling of a female “ professional ” voice :
Jeff Harris , a member of the product staff at OpenAI , told TechCrunch that the destination is to let developer cut both the voice “ experience ” and “ context of use . ”
Join us at TechCrunch Sessions: AI
Exhibit at TechCrunch Sessions: AI
“ In unlike contexts , you do n’t just want a flat , monotonous spokesperson , ” Harris said . “ If you ’re in a customer support experience and you want the articulation to be apologetic because it ’s made a error , you could actually have the voice have that emotion in it … Our bighearted impression , here , is that developer and users want to really hold not just what is spoken , but how things are spoken . ”
As for OpenAI ’s new speech - to - text models , “ gpt-4o - transcribe ” and “ gpt-4o - mini - transcribe , ” they effectively replace the ship’s company ’s long - in - the - toothWhisper written text model . Trained on “ diverse , gamey - character audio datasets , ” the new model can intimately capture tonic and varied delivery , OpenAI claims , even in chaotic surround .
They ’re also less potential to hallucinate , Harris supply . Whisper notoriously tended to invent words — and even whole passages — in conversations , introducing everything from racial comment to imagined medical treatment into transcript .
“ [ T]hese models are much amend versus Whisper on that front , ” Harris said . “ Making sure the models are accurate is completely all-important to getting a dependable voice experience , and precise [ in this context ] means that the fashion model are hearing the Scripture exactly [ and ] are n’t fill in item that they did n’t find out . ”
Your mileage may vary count on the speech communication being transcribed , however .
According to OpenAI ’s internal benchmarks , gpt-4o - transcribe , the more precise of the two arranging models , has a “ tidings error rate ” border on 30 % ( out of 120 % ) for Indic and Dravidian languages such as Tamil , Telugu , Malayalam , and Kannada . That think three out of every 10 words from the model will differ from a human transcription in those speech communication .
In a break from tradition , OpenAI does n’t project to make its new transcription models openly available . The companyhistorically released new versions of Whisperfor commercial employment under an MIT permission .
Harris say that gpt-4o - transcribe and gpt-4o - mini - transcribe are “ much bigger than Whisper ” and thus not right candidates for an overt release .
“ [ T]hey’re not the kind of model that you may just run locally on your laptop , like Whisper , ” he extend . “ [ W]e want to make trusted that if we ’re unloose things in opened source , we ’re doing it thoughtfully , and we have a model that ’s really honed for that specific motivation . And we think that remnant - user equipment are one of the most interesting cases for open - informant model . ”
Updated March 20 , 2025 , 11:54 a.m. PT to clarify the languagearound word fault rate and updated the bench mark effect chart with a more late translation .