Topics
Latest
AI
Amazon
Image Credits:Boris SV / Getty Images
Apps
Biotech & Health
Climate
Image Credits:Boris SV / Getty Images
Cloud Computing
Commerce
Crypto
a sample question from Arc-AGI-2.Image Credits:Arc Prize
initiative
EVs
Fintech
Comparison of Frontier AI model performance on ARC-AGI-1 and ARC-AGI-2.Image Credits:Arc Prize
fund-raise
convenience
Gaming
Government & Policy
Hardware
layoff
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
security department
societal
Space
Startups
TikTok
shipping
Venture
More from TechCrunch
case
Startup Battlefield
StrictlyVC
Podcasts
Videos
Partner Content
TechCrunch Brand Studio
Crunchboard
Contact Us
The Arc Prize Foundation , a nonprofit atomic number 27 - founded by salient AI researcher François Chollet , announce in ablog poston Monday that it has created a new , intriguing test to measure the general intelligence of leading AI models .
So far , the new run , call ARC - AGI-2 , has stump most models .
“ Reasoning ” AI modelling like OpenAI ’s o1 - pro and DeepSeek ’s R1 grade between 1 % and 1.3 % on ARC - AGI-2 , according to theArc Prize leaderboard . herculean non - thinking models , include GPT-4.5 , Claude 3.7 Sonnet , and Gemini 2.0 Flash , score around 1 % .
The ARC - AGI run consist of puzzler - like problems where an AI has to identify visual patterns from a appeal of unlike - colored public square and generate the correct “ answer ” control grid . The job were designed to force an AI to adapt to novel problems it has n’t seen before .
The Arc Prize Foundation had over 400 people take ARC - AGI-2 to establish a human service line . On average , “ panels ” of these mass got 60 % of the test ’s questions right — much unspoiled than any of the models ’ scores .
In apost on X , Chollet take ARC - AGI-2 is a better standard of an AI model ’s actual intelligence operation than the first iteration of the test , ARC - AGI-1 . The Arc Prize Foundation ’s tests are aimed at evaluating whether an AI scheme can efficiently acquire new skills outside the data it was trained on .
Chollet said that unlike ARC - AGI-1 , the raw mental testing prevents AI models from relying on “ brute violence ” — extensive computing top executive — to find solutions . Chollet previously acknowledgedthis was a major fault of ARC - AGI-1 .
Join us at TechCrunch Sessions: AI
Exhibit at TechCrunch Sessions: AI
To handle the first run ’s flaws , ARC - AGI-2 introduces a newfangled metrical : efficiency . It also take models to interpret patterns on the tent flap rather of bank on memorization .
“ Intelligence is not solely defined by the ability to solve problem or attain high scores , ” Arc Prize Foundation co - founding father Greg Kamradt write in ablog post . “ The efficiency with which those capabilities are acquired and deployed is a crucial , define component . The core question being asked is not just , ‘ Can AI produce [ the ] attainment to solve a task ? ’ but also , ‘ At what efficiency or cost ? ’ ”
ARC - AGI-1 was unbeaten for roughly five years until December 2024 , when OpenAI liberate itsadvanced reasoning model , o3 , which outperformed all other AI models and matched human performance on the rating . However , as we note at the time , o3 ’s performance gains on ARC - AGI-1 hail with a hefty price tag .
The version of OpenAI ’s o3 simulation — o3 ( down in the mouth ) — that was first to reach novel heights on ARC - AGI-1 , scoring 75.7 % on the trial , got a measly 4 % on ARC - AGI-2 using $ 200 worth of reckon power per task .
The arriver of ARC - AGI-2 total as many in the technical school manufacture are call for unexampled , unsaturated benchmark to evaluate AI procession . Hugging Face ’s Centennial State - founder , Thomas Wolf , recently tell TechCrunch thatthe AI manufacture miss sufficient trial to measure the key traits of hokey general intelligence information , include creativity .
Alongside the new benchmark , the Arc Prize Foundation announceda new Arc Prize 2025 competition , challenging developer to arrive at 85 % accuracy on the ARC - AGI-2 test while only spending $ 0.42 per project .