Why is ChatGPT so bad at math?

Topics

Latest

Amazon

Image Credits:ChristianChan(opens in a new window)/Shutterstock(opens in a new window)

Apps

Biotech & Health

clime

Image Credits:ChristianChan(opens in a new window)/Shutterstock(opens in a new window)

Cloud Computing

Commerce Department

Crypto

enterprisingness

EVs

Fintech

Fundraising

gadget

Gaming

Google

Government & Policy

ironware

Instagram

Layoffs

Media & Entertainment

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

Podcasts

picture

Partner Content

TechCrunch Brand Studio

Crunchboard

get through Us

If you ’ve ever attempt to expend ChatGPT as a reckoner , you ’ve almost certainly discover itsdyscalculia : The chatbot is bad at maths . And it ’s not unique among AI in this paying attention .

Anthropic’sClaudecan’t solvebasic Son problems . Geminifails to understandquadratic equations . And Meta’sLlamastruggles with straightforwardaddition .

So how is it that these bot can write soliloquy , yet get trigger up by grade - school day - level arithmetic ?

Tokenizationhas something to do with it . The appendage of dividing data up into clod ( for instance , breaking the Book “ fantastic ” into the syllable “ fan , ” “ tas , ” and “ tic ” ) , tokenization helps AI densely encode information . But because tokenizers — the AI example that do the tokenizing — do n’t really know what numbers are , they oftentimes end updestroying the relationshipsbetween digits . For model , a tokenizer might treat the identification number “ 380 ” as one token but represent “ 381 ” as a pair of digits ( “ 38 ” and “ 1 ” ) .

But tokenization is n’t the only intellect math ’s a frail slur for AI .

AI system are statistical machines . prepare on a lot of representative , they learn the patterns in those example to make predictions ( like that the phrase “ to whom ” in an email often precedes the idiom “ it may refer ” ) . For representative , given the times trouble 5,7897 x 1,2832 , ChatGPT — having seen a quite a little of multiplication job — will likely guess the product of a number ending in “ 7 ” and a identification number ending in “ 2 ” will end in “ 4 . ” But it ’ll struggle with the mediate part . ChatGPT gave me the solvent 742,021,104 ; the right one is 742,934,304 .

Yuntian Deng , an assistant prof at the University of Waterloo particularise in AI , thoroughly benchmarked ChatGPT ’s multiplication abilities in a subject area earlier this year . He and co - authors found that the default poser , GPT-4o , struggled to reproduce beyond two numbers containing more than four digit each ( e.g. , 3,459 x 5,284 ) .

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

“ GPT-4o struggles with multi - digit times , achieve less than 30 % truth beyond four - digit by four - digit problems , ” Deng narrate TechCrunch . “ Multi - digit generation is challenging for language model because a mistake in any average stride can compound , contribute to incorrect final results . ”

Is OpenAI ’s o1 a good calculator ? We tested it on up to 20×20 times — o1 solves up to 9×9 multiplication with decent accuracy , while gpt-4o struggles beyond 4×4 . For context , this project is solvable by a small LM using unquestioning CoT with step-by-step incorporation . 1/4pic.twitter.com / et5DB9bhNL

So , will math skills always skirt ChatGPT ? Or is there reason to believe the bot might someday become as practiced with numbers as man ( or a TI-84 , for that matter ) ?

Deng is bright . In the bailiwick , he and his confrere also testedo1 , OpenAI ’s “ argue ” modelthat recently come to ChatGPT . The o1 , which “ thinks ” through problem footstep by footstep before answering them , perform much better than GPT-4o , get up to nine - digit by nine - digit generation problems correct about half the time .

“ The model might be empty the trouble in ways that differ from how we solve it manually , ” Deng said . “ It make us curious about the model ’s internal approaching and how it differs from human reasoning . ”

Deng think that the advancement indicates that at least some eccentric of maths trouble — multiplication problem being one of them — will finally be “ fully solved ” by ChatGPT - like systems . “ This is a well - defined task with know algorithmic program , ” Deng said . “ We ’re already seeing pregnant improvements from GPT-4o to o1 , so it ’s open that enhancements in reasoning capability are happening . ”

Just do n’t get rid of your calculator anytime before long .

Topics#

More from TechCrunch#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

Topics

More from TechCrunch

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI