First impressions of OpenAI o1: An AI designed to overthink it

Topics

belated

Amazon

Image Credits:David Paul Morris/Bloomberg / Getty Images

Apps

Biotech & Health

Climate

Sam Altman, chief executive officer of OpenAI, during the Apple Worldwide Developers Conference at Apple Park campus in Cupertino, California, US, on Monday, June 10, 2024.

Image Credits:David Paul Morris/Bloomberg / Getty Images

Cloud Computing

Commerce

Crypto

(Maxwell Zeff/OpenAI)

enterprisingness

EVs

Fintech

(Maxwell Zeff/OpenAI)

Fundraising

Gadgets

Gaming

Google

Government & Policy

Hardware

Instagram

Layoffs

Media & Entertainment

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

Podcasts

television

Partner Content

TechCrunch Brand Studio

Crunchboard

meet Us

OpenAI release itsnew o1 modelson Thursday , giving ChatGPT user their first opportunity to try AI model that intermit to “ think ” before they answer . There ’s been a great deal of hype building up to these example , codenamed “ Strawberry ” inside OpenAI . But does Strawberry live up to the plug ?

Sort of .

Compared to GPT-4o , the o1 models finger like one step forrader and two footmark back . OpenAI o1 excels at reasoning and suffice complex questions , but the mannikin is roughly four clip more expensive to apply than GPT-4o . OpenAI ’s latest model miss the instrument , multimodal capabilities , and upper that made GPT-4o so impressive . In fact , OpenAI even admits that “ GPT-4o is still the best option for most prompt ” on its helper page , and note elsewhere that o1 struggles at elementary undertaking .

“ It ’s telling , but I think the betterment is not very significant , ” said Ravid Shwartz Ziv , an NYU professor who studies AI models . “ It ’s better at sure job , but you do n’t have this across - the - board betterment . ”

For all of these reasons , it ’s of import to use o1 only for the questions it ’s truly designed to help with : big ones . To be clear , most people are not using reproductive AI to serve these kinds of head today , largely because today ’s AI models are not very good at it . However , o1 is a tentative dance step in that way .

Thinking through big ideas

OpenAI o1 is unique because it “ cogitate ” before answering , break down boastful problem into small steps and assay to identify when it dumbfound one of those steps right or wrong . This “ multi - step reasoning ” is n’t solely new ( researchers have advise it for years , and You.comuses it for complex queries ) , but it has n’t been virtual until of late .

“ There ’s a lot of excitement in the AI community , ” state Workera CEO and Stanford assistant lecturer Kian Katanforoosh , who teaches class on machine scholarship , in an consultation . “ If you could train a support learning algorithm geminate with some of the language model technique that OpenAI has , you could technically make step - by - step thought process and permit the AI manikin to take the air backwards from big ideas you ’re trying to solve through . ”

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

OpenAI o1 is also uniquely pricey . In most model , you pay for input tokens and output signal token . However , o1 adds a hidden process ( the small whole step the model fall in big problems into ) , which adds a bombastic amount of compute you never fully see . OpenAI is hiding some details of this procedure to maintain its competitive advantage . That said , you still get charge for these in the form of “ reasoning tokens . ” This further emphasizes why you call for to be careful about using OpenAI o1 , so you do n’t get charged a ton of tokens for ask where the Das Kapital of Nevada is .

The estimate of an AI model that help you “ walk back from big mind ” is muscular , though . In practice , the model is pretty in force at that .

In one example , I asked ChatGPT o1 preview to facilitate my menage programme Thanksgiving , a project that could benefit from a little unbiased logic and reasoning . Specifically , I wanted help figuring out if two ovens would be sufficient to cook a Thanksgiving dinner party for 11 the great unwashed and wanted to talk through whether we should deliberate renting an Airbnb to get entree to a third oven .

After 12 moment of “ thinking , ” ChatGPT wrote me out a 750 + word reception at long last enjoin me that two ovens should be sufficient with some deliberate strategizing , and will leave my family to save up on price and expend more time together . But it broke down its cerebration for me at each footmark of the way and explained how it considered all of these external factors , including costs , family time , and oven management .

ChatGPT o1 preview told me how to prioritize oven place at the planetary house that is hosting the case , which was fresh . Oddly , it suggested I view renting a portable oven for the day . That said , the model performed much better than GPT-4o , which required multiple postdate - up question about what precise dishes I was bringing , and then apply me bare - clappers advice I found less useful .

ask about Thanksgiving dinner may seem cockamamy , but you could see how this tool would be helpful for expose down complicated tasks .

I also require o1 to help me design out a busy mean solar day at work , where I needed to travel between the airport , multiple in - person meetings in various locations , and my post . It gave me a very detailed design , but perchance was a minuscule snatch much . Sometimes , all the added steps can be a picayune overwhelming .

For a simpler question , o1 does way too much — it does n’t lie with when to halt overthinking . I necessitate where you may find cedar trees in America , and it delivered an 800 + word response , outlining every pas seul of cedar tree Sir Herbert Beerbohm Tree in the country , including their scientific name . It even had to consult with OpenAI ’s policy at some point , for some reason . GPT-4o did a much better business answering this interrogation , delivering me about three sentences explain you may regain the trees all over the land .

Tempering expectations

In some way , Strawberry was never go to live up to the ballyhoo . report about OpenAI ’s reasoning models date stamp back to November 2023 , right around the time everyone was face for an answer about why OpenAI ’s board ousted Sam Altman . That spun up the rumor mill in the AI world , leaving some to speculate that Strawberry was a form of AGI , the enlightened variation of AI that OpenAI aspires to at last create .

Altmanconfirmed o1 is notAGI to clear up any doubts , not that you ’d be confused after using the thing . The CEO also trim expectations around this launching , tweetingthat “ o1 is still blemished , still special , and it still seems more telling on first role than it does after you pass more sentence with it . ”

The relief of the AI world is coming to condition with a less exciting launch than wait .

“ The hype variety of grew out of OpenAI ’s control , ” state Rohan Pandey , a research engineer with the AI startup ReWorkd , which builds web scrapers with OpenAI ’s models .

He ’s hoping that o1 ’s logical thinking ability is serious enough to lick a niche bent of complicated problem where GPT-4 falls short . That ’s likely how most hoi polloi in the industry are view o1 , but not quite as the rotatory step forward that GPT-4 represent for the industry .

“ Everybody is waiting for a whole step occasion modification for capabilities , and it is indecipherable that this represents that . I think it ’s that simple , ” tell Brightwave CEO Mike Conover , who previously co - created Databricks ’ AI model Dolly , in an interview .

What’s the value here?

The underlie principle used to create o1 go back years . Google used standardized techniques in 2016 to make AlphaGo , the first AI system to shoot down a man virtuoso of the display panel biz Go , former Googler and CEO of the venture house S32 , Andy Harrison , points out . AlphaGo trained by playing against itself innumerous times , basically ego - pedagogy until it reached superhuman capability .

He notes that this brings up an age - sure-enough public debate in the AI world .

“ cantonment one thinks that you could automatise workflow through this agentic process . summer camp two think that if you had generalized intelligence and reasoning , you would n’t need the work flow and , like a human , the AI would just make a legal opinion , ” say Harrison in an audience .

Harrison articulate he ’s in summer camp one and that inner circle two requires you to trust AI to make the right decision . He does n’t think we ’re there yet .

However , others think of o1 as less of a decision - Godhead and more of a dick to question your thinking on big decision .

Katanforoosh , the Workera CEO , described an example where he was snuff it to question a datum scientist to work at his troupe . He separate OpenAI o1 that he only has 30 minutes and wants to asses a certain turn of skills . He can ferment backward with the AI model to realize if he ’s thinking about this correctly , and o1 will read time constraint and whatnot .

The interrogation is whether this helpful creature is worth the hefty Mary Leontyne Price tag . As AI models continue to get chintzy , o1 is one of the first AI fashion model in a long time that we ’ve envision get more expensive .

Topics#

More from TechCrunch#

Thinking through big ideas#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

Tempering expectations#

What’s the value here?#