Topics
Latest
AI
Amazon
Image Credits:Gladia (Image has been modified)
Apps
Biotech & Health
Climate
Image Credits:Gladia (Image has been modified)
Cloud Computing
Commerce
Crypto
Enterprise
EVs
Fintech
Fundraising
contrivance
gage
Government & Policy
Hardware
Layoffs
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
Security
Social
Space
Startups
TikTok
Transportation
Venture
More from TechCrunch
event
Startup Battlefield
StrictlyVC
Podcasts
Videos
Partner Content
TechCrunch Brand Studio
Crunchboard
touch Us
Gallic startupGladia , which offers a speech - acknowledgement software computer programming port ( API ) , has raised $ 16 million in a Series A funding round of drinks . basically , Gladia ’s API lets you turn any audio file into text with a in high spirits spirit level of truth and low reverse meter .
While Amazon , Microsoft , and Google all offer speech - to - textual matter APIs as part of their swarm - host product suite , they do n’t perform as well as new models offered by specialised startup .
There has been tremendous advance in this field over the preceding couple of years , particularly after therelease of Whisperby OpenAI . Gladia competes with other well - funded companies in the space , such asAssemblyAI , DeepgramandSpeechmatics .
Gladia originally offered a fine - tune version of Whisper ’s manner of speaking - to - text mannikin with some much required improvements . For instance , the startup supports diarization out of the boxwood — it can observe when there are multiple speakers in a conversation and separate the recording , and transcribed text , bet on who ’s talking .
Gladia support 100 languages and a wide variety of accent . This newsman can sustain that it work , as we ’ve been using Gladia to transliterate some audience , and accents were n’t an exit .
The inauguration offer its speech - to - school text modelling as a host API that users can leverage in their own applications and services . More than 600 companionship habituate Gladia , including several group meeting recording machine and note - taking assistants like Attention , Circleback , Method Financial , Recall , Sana and Veed.io .
That particular use case is interesting , because many company have to chain API calls . They first twist speech into text edition , which they then feed into a heavy language model ( LLM ) , such as GPT-4o or Claude 3.5 Sonnet , to extract knowledge from heavy wall of text .
Join us at TechCrunch Sessions: AI
Exhibit at TechCrunch Sessions: AI
With the new financial support , Gladia desire to simplify that pipeline by integrating audio intelligence and LLM - ground tasks in a individual API call . For instance , a client could get a conversation summary generated from a handful of bullet points without get to bank on a third - party LLM API .
The other issue that Gladia is look to solve is latency . You may have seen some demonstration of tangible - prison term audio conversation with an AI - based call in agent ( 11x has a dependable demoon its website ) , and these system have to be capable to transliterate in near real time to make such conversation sound as human - like as possible .
“ We realized that actual time was n’t very good in terms of character in the marketplace in general . And people had a weird use lawsuit . They were doing real - fourth dimension processing , and then they were grabbing the audio and running it in stack . We wondered : ‘ Why are you doing this ? ’ They told us : ‘ The quality is n’t good in real - meter processing , so we transcribe it in batch afterwards , ’ ” co - father and CEO Jean - Louis Quéguiner ( figure above ; correct ) assure TechCrunch .
Gladia chose to tackle this problem , and it can currently transcribe a live conversation with a latency of under 300 millisecond . The company claims that the real - time processing is now more or less as good as the default , asynchronous stack arranging API , but it ’s hard for us to judge without some proper testing . As Quéguiner says , the inauguration is aiming for “ batch quality with real - time capability . ”
AI scream agents aside , you could imagine a call center using those literal - prison term capabilities to help calling agents find relevant data in the middle of a call . “ Our individual API is compatible with all be tech slew and communications protocol , including SIP , VoIP , FreeSwitch and Asterisk , ” co - founder and CTO Jonathan Soto ( pictured above ; left ) said in a statement .
XAnge is direct the Series A funding round . Illuminate Financial , XTX Ventures , Athletico Ventures , Gaingels , Mana Ventures , Motier Ventures , Roosh Ventures , and Soma Capital also participate .
Gladia believe we are on the threshold of a “ ChatGPT moment ” for audio applications . GPT engineering has been around for years , but ChatGPT really popularize LLMs with its consumer schmooze - comparable interface .
As Apple or Google startle including arrangement fashion model within iOS or Android , consumers will start to understand the value of automated transcription within the apps they use . developer will probably then desegregate audio features in their intersection , and that ’s where API providers like Gladia will issue forth in .