OpenAI unveils o1, a model that can fact-check itself

Topics

Latest

Amazon

Image Credits:Didem Mente/Anadolu Agency / Getty Images

Apps

Biotech & Health

Climate

OpenAI o1

Image Credits:OpenAI

Cloud Computing

Commerce

Crypto

OpenAI o1

Image Credits:OpenAI

Enterprise

EVs

Fintech

Fundraising

Gadgets

Gaming

Google

Government & Policy

ironware

Instagram

layoff

Media & Entertainment

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

adjoin Us

ChatGPTmaker OpenAI hasannouncedits next major product liberation : A generative AI model code - named Strawberry , formally call OpenAI o1 .

To be more precise , o1 is really a family line of modeling . Two are useable Thursday in ChatGPT and via OpenAI ’s API : o1 - preview and o1 - mini , a smaller , more effective manikin aim at computer code propagation .

take down that the o1 chatbot experience is fairly barebones at present . Unlike GPT-4o , o1 ’s forbear , o1 ca n’t browse the web or analyze single file yet . The theoretical account does have prototype - analyzing features , but they ’ve been disable pending extra testing . And o1 is rate - limited ; hebdomadary limits are presently 30 message for o1 - preview and 50 for o1 - mini .

In another downside , o1 isexpensive . Very expensive . In the API , o1 - prevue is $ 15 per 1 million input item and $ 60 per 1 million output token . That ’s 6x the price versus GPT-4o for input and 6x the cost for output . ( “ Tokens ” are bits of tender data ; 1 million is equivalent to around 750,000 words . )

OpenAI tell it plans to convey o1 - miniskirt access to all free drug user of ChatGPT but has n’t set a release date . We ’ll hold the party to it .

Chain of reasoning

OpenAI o1 avoids some of the reasoning pitfall that ordinarily trigger off up generative AI models because it can efficaciously fact - check itself by spending more time considering all parts of a head . What makes o1 “ experience ” qualitatively different from other generative AI models is its ability to “ intend ” before responding to queries , according to OpenAI .

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

When establish additional time to “ think , ” o1 can reason through a labor holistically — planning ahead and performing a serial publication of activity over an elongated period of time that aid the simulation arrive at an resolution . This make o1 well - become for tasks that require synthesize the consequence of multiple subtasks , like detecting inner electronic mail in an lawyer ’s inbox or brainstorming a product marketing scheme .

In a series ofpostson X on Thursday , Noam Brown , a research scientist at OpenAI , say that “ o1 is train with reenforcement encyclopaedism . ” This teach the system “ to ‘ mean ’ before responding via a private Ernst Boris Chain of thought ” through rewards when o1 gets answers right and penalties when it does not , he said .

Brown alluded to the fact that OpenAI leveraged a unexampled optimization algorithmic program and training dataset containing “ reasoning data ” and scientific literature specifically tailor-make for logical thinking chore . “ The longer [ o1 ] thinks , the good it does , ” he tell .

TechCrunch was n’t offered the opportunity to test o1 before its unveiling ; we ’ll get our work force on it as soon as potential . But according to a person whodidhave access — Pablo Arredondo , VP at Thomson Reuters — o1 is good than OpenAI ’s previous modeling ( for instance , GPT-4o ) at things like analyze sound briefs and identifying solution to problems in LSAT logic games .

“ We saw it tackling more substantial , multi - faceted , analysis , ” Arredondo assure TechCrunch . “ Our automated examination also showed addition against a wide cooking stove of simple task . ”

In a passing exam for the International Mathematical Olympiad ( IMO ) , a high schooling math contender , o1 correctly solved 83 % of problems while GPT-4o only work 13 % , according to OpenAI . ( That ’s less telling when you consider that Google DeepMind ’s recent AIachieveda silver palm in an equivalent to the actual IMO competition . ) OpenAI also say that o1 strive the 89th percentile of participant — better than DeepMind ’s flagship systemAlphaCode 2 , for what it ’s worth — in the online programming challenge attack known as Codeforces .

In ecumenical , o1 should execute better on trouble in data depth psychology , skill , and tease , OpenAI says . ( GitHub , which tested o1 with its AI befool assistantGitHub Copilot , reportsthat the model is adept at optimizing algorithms and app codification . ) And , at least per OpenAI ’s benchmarking , o1 amend over GPT-4o in its multilingual skills , especially in spoken communication like Arabic and Korean .

Ethan Mollick , a prof of management at Wharton , wrotehis impressions of o1 after using it for a calendar month in a post on his personal web log . On a challenging crossword puzzle puzzle , o1 did well , he say — generate all the answers correct ( despite hallucinate a novel clue ) .

OpenAI o1 is not perfect

Now , there are drawback .

OpenAI o1canbe slower than other models , depending on the query . Arredondo says o1 can take over 10 second to answer some questions ; it prove its progress by displaying a label for the current subtask it ’s do .

Given theunpredictable natureof generative AI models , o1 likely has other fault and limitation . Brown allow in that o1 trip up up on games of tic - tac - toe from time to time , for example . And in atechnical paper , OpenAI said that it ’s heard anecdotical feedback from testers that o1 tends to hallucinate ( i.e. , confidently make clobber up)morethan GPT-4o — and less often admits when it does n’t have the answer to a question .

“ Errors and hallucinations still happen [ with o1 ] , ” Mollick publish in his berth . “ It still is n’t flawless . ”

We ’ll no doubt memorise more about the various issues in time , and once we have a chance to put o1 through the wringer ourselves .

Fierce competition

We ’d be neglectful if we did n’t charge out that OpenAI is far from the only AI vendor look into these types of reasoning method to improve simulation factualness .

Google DeepMind researchers recently published astudyshowing that by essentially giving good example more compute time and guidance to fulfill asking as they ’re made , the operation of those modelling can be significantly improved without any extra tweaks .

illustrate the vehemence of the challenger , OpenAIsaidthat it adjudicate against showing o1 ’s in the raw “ chains of thoughts ” in ChatGPT part due to “ competitive reward . ” ( Instead , the company choose to show “ role model - generate summary ” of the chains . )

OpenAI might be first out of the gate with o1 . But assuming rivals soon trace suit with similar model , the party ’s veridical psychometric test will be cause o1 widely available — and for cheaper .

From there , we ’ll see how quickly OpenAI can surrender upgraded versions of o1 . The caller says it propose to try out with o1 framework that understanding for hour , days , or even week to further advance their reasoning capabilities .

Topics#

More from TechCrunch#

Chain of reasoning#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

OpenAI o1 is not perfect#

Fierce competition#