Topics
belated
AI
Amazon
Image Credits:David Paul Morris/Bloomberg / Getty Images
Apps
Biotech & Health
Climate
Image Credits:David Paul Morris/Bloomberg / Getty Images
Cloud Computing
Commerce
Crypto
(Maxwell Zeff/OpenAI)
enterprisingness
EVs
Fintech
(Maxwell Zeff/OpenAI)
Fundraising
Gadgets
Gaming
Government & Policy
Hardware
Layoffs
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
Security
Social
quad
startup
TikTok
deportation
speculation
More from TechCrunch
Events
Startup Battlefield
StrictlyVC
Podcasts
television
Partner Content
TechCrunch Brand Studio
Crunchboard
meet Us
OpenAI release itsnew o1 modelson Thursday , giving ChatGPT user their first opportunity to try AI model that intermit to “ think ” before they answer . There ’s been a great deal of hype building up to these example , codenamed “ Strawberry ” inside OpenAI . But does Strawberry live up to the plug ?
Sort of .
Compared to GPT-4o , the o1 models finger like one step forrader and two footmark back . OpenAI o1 excels at reasoning and suffice complex questions , but the mannikin is roughly four clip more expensive to apply than GPT-4o . OpenAI ’s latest model miss the instrument , multimodal capabilities , and upper that made GPT-4o so impressive . In fact , OpenAI even admits that “ GPT-4o is still the best option for most prompt ” on its helper page , and note elsewhere that o1 struggles at elementary undertaking .
“ It ’s telling , but I think the betterment is not very significant , ” said Ravid Shwartz Ziv , an NYU professor who studies AI models . “ It ’s better at sure job , but you do n’t have this across - the - board betterment . ”
For all of these reasons , it ’s of import to use o1 only for the questions it ’s truly designed to help with : big ones . To be clear , most people are not using reproductive AI to serve these kinds of head today , largely because today ’s AI models are not very good at it . However , o1 is a tentative dance step in that way .
Thinking through big ideas
OpenAI o1 is unique because it “ cogitate ” before answering , break down boastful problem into small steps and assay to identify when it dumbfound one of those steps right or wrong . This “ multi - step reasoning ” is n’t solely new ( researchers have advise it for years , and You.comuses it for complex queries ) , but it has n’t been virtual until of late .
“ There ’s a lot of excitement in the AI community , ” state Workera CEO and Stanford assistant lecturer Kian Katanforoosh , who teaches class on machine scholarship , in an consultation . “ If you could train a support learning algorithm geminate with some of the language model technique that OpenAI has , you could technically make step - by - step thought process and permit the AI manikin to take the air backwards from big ideas you ’re trying to solve through . ”
Join us at TechCrunch Sessions: AI
Exhibit at TechCrunch Sessions: AI
OpenAI o1 is also uniquely pricey . In most model , you pay for input tokens and output signal token . However , o1 adds a hidden process ( the small whole step the model fall in big problems into ) , which adds a bombastic amount of compute you never fully see . OpenAI is hiding some details of this procedure to maintain its competitive advantage . That said , you still get charge for these in the form of “ reasoning tokens . ” This further emphasizes why you call for to be careful about using OpenAI o1 , so you do n’t get charged a ton of tokens for ask where the Das Kapital of Nevada is .
The estimate of an AI model that help you “ walk back from big mind ” is muscular , though . In practice , the model is pretty in force at that .
In one example , I asked ChatGPT o1 preview to facilitate my menage programme Thanksgiving , a project that could benefit from a little unbiased logic and reasoning . Specifically , I wanted help figuring out if two ovens would be sufficient to cook a Thanksgiving dinner party for 11 the great unwashed and wanted to talk through whether we should deliberate renting an Airbnb to get entree to a third oven .
After 12 moment of “ thinking , ” ChatGPT wrote me out a 750 + word reception at long last enjoin me that two ovens should be sufficient with some deliberate strategizing , and will leave my family to save up on price and expend more time together . But it broke down its cerebration for me at each footmark of the way and explained how it considered all of these external factors , including costs , family time , and oven management .
ChatGPT o1 preview told me how to prioritize oven place at the planetary house that is hosting the case , which was fresh . Oddly , it suggested I view renting a portable oven for the day . That said , the model performed much better than GPT-4o , which required multiple postdate - up question about what precise dishes I was bringing , and then apply me bare - clappers advice I found less useful .
ask about Thanksgiving dinner may seem cockamamy , but you could see how this tool would be helpful for expose down complicated tasks .
I also require o1 to help me design out a busy mean solar day at work , where I needed to travel between the airport , multiple in - person meetings in various locations , and my post . It gave me a very detailed design , but perchance was a minuscule snatch much . Sometimes , all the added steps can be a picayune overwhelming .
For a simpler question , o1 does way too much — it does n’t lie with when to halt overthinking . I necessitate where you may find cedar trees in America , and it delivered an 800 + word response , outlining every pas seul of cedar tree Sir Herbert Beerbohm Tree in the country , including their scientific name . It even had to consult with OpenAI ’s policy at some point , for some reason . GPT-4o did a much better business answering this interrogation , delivering me about three sentences explain you may regain the trees all over the land .
Tempering expectations
In some way , Strawberry was never go to live up to the ballyhoo . report about OpenAI ’s reasoning models date stamp back to November 2023 , right around the time everyone was face for an answer about why OpenAI ’s board ousted Sam Altman . That spun up the rumor mill in the AI world , leaving some to speculate that Strawberry was a form of AGI , the enlightened variation of AI that OpenAI aspires to at last create .
Altmanconfirmed o1 is notAGI to clear up any doubts , not that you ’d be confused after using the thing . The CEO also trim expectations around this launching , tweetingthat “ o1 is still blemished , still special , and it still seems more telling on first role than it does after you pass more sentence with it . ”
The relief of the AI world is coming to condition with a less exciting launch than wait .
“ The hype variety of grew out of OpenAI ’s control , ” state Rohan Pandey , a research engineer with the AI startup ReWorkd , which builds web scrapers with OpenAI ’s models .
He ’s hoping that o1 ’s logical thinking ability is serious enough to lick a niche bent of complicated problem where GPT-4 falls short . That ’s likely how most hoi polloi in the industry are view o1 , but not quite as the rotatory step forward that GPT-4 represent for the industry .
“ Everybody is waiting for a whole step occasion modification for capabilities , and it is indecipherable that this represents that . I think it ’s that simple , ” tell Brightwave CEO Mike Conover , who previously co - created Databricks ’ AI model Dolly , in an interview .
What’s the value here?
The underlie principle used to create o1 go back years . Google used standardized techniques in 2016 to make AlphaGo , the first AI system to shoot down a man virtuoso of the display panel biz Go , former Googler and CEO of the venture house S32 , Andy Harrison , points out . AlphaGo trained by playing against itself innumerous times , basically ego - pedagogy until it reached superhuman capability .
He notes that this brings up an age - sure-enough public debate in the AI world .
“ cantonment one thinks that you could automatise workflow through this agentic process . summer camp two think that if you had generalized intelligence and reasoning , you would n’t need the work flow and , like a human , the AI would just make a legal opinion , ” say Harrison in an audience .
Harrison articulate he ’s in summer camp one and that inner circle two requires you to trust AI to make the right decision . He does n’t think we ’re there yet .
However , others think of o1 as less of a decision - Godhead and more of a dick to question your thinking on big decision .
Katanforoosh , the Workera CEO , described an example where he was snuff it to question a datum scientist to work at his troupe . He separate OpenAI o1 that he only has 30 minutes and wants to asses a certain turn of skills . He can ferment backward with the AI model to realize if he ’s thinking about this correctly , and o1 will read time constraint and whatnot .
The interrogation is whether this helpful creature is worth the hefty Mary Leontyne Price tag . As AI models continue to get chintzy , o1 is one of the first AI fashion model in a long time that we ’ve envision get more expensive .