Topics

Latest

AI

Amazon

Article image

Image Credits:Boris SV / Getty Images

Apps

Biotech & Health

Climate

Article image

Image Credits:Boris SV / Getty Images

Cloud Computing

Commerce

Crypto

Article image

a sample question from Arc-AGI-2.Image Credits:Arc Prize

initiative

EVs

Fintech

Article image

Comparison of Frontier AI model performance on ARC-AGI-1 and ARC-AGI-2.Image Credits:Arc Prize

fund-raise

convenience

Gaming

Google

Government & Policy

Hardware

Instagram

layoff

Media & Entertainment

Meta

Microsoft

Privacy

Robotics

security department

societal

Space

Startups

TikTok

shipping

Venture

More from TechCrunch

case

Startup Battlefield

StrictlyVC

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

Contact Us

The Arc Prize Foundation , a nonprofit atomic number 27 - founded by salient AI researcher François Chollet , announce in ablog poston Monday that it has created a new , intriguing test to measure the general intelligence of leading AI models .

So far , the new run , call ARC - AGI-2 , has stump most models .

“ Reasoning ” AI modelling like OpenAI ’s o1 - pro and DeepSeek ’s R1 grade between 1 % and 1.3 % on ARC - AGI-2 , according to theArc Prize leaderboard . herculean non - thinking models , include GPT-4.5 , Claude 3.7 Sonnet , and Gemini 2.0 Flash , score around 1 % .

The ARC - AGI run consist of puzzler - like problems where an AI has to identify visual patterns from a appeal of unlike - colored public square and generate the correct “ answer ” control grid . The job were designed to force an AI to adapt to novel problems it has n’t seen before .

The Arc Prize Foundation had over 400 people take ARC - AGI-2 to establish a human service line . On average , “ panels ” of these mass got 60 % of the test ’s questions right — much unspoiled than any of the models ’ scores .

In apost on X , Chollet take ARC - AGI-2 is a better standard of an AI model ’s actual intelligence operation than the first iteration of the test , ARC - AGI-1 . The Arc Prize Foundation ’s tests are aimed at evaluating whether an AI scheme can efficiently acquire new skills outside the data it was trained on .

Chollet said that unlike ARC - AGI-1 , the raw mental testing prevents AI models from relying on “ brute violence ” — extensive computing top executive — to find solutions . Chollet previously acknowledgedthis was a major fault of ARC - AGI-1 .

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

To handle the first run ’s flaws , ARC - AGI-2 introduces a newfangled metrical : efficiency . It also take models to interpret patterns on the tent flap rather of bank on memorization .

“ Intelligence is not solely defined by the ability to solve problem or attain high scores , ” Arc Prize Foundation co - founding father Greg Kamradt write in ablog post . “ The efficiency with which those capabilities are acquired and deployed is a crucial , define component . The core question being asked is not just , ‘ Can AI produce [ the ] attainment to solve a task ? ’ but also , ‘ At what efficiency or cost ? ’ ”

ARC - AGI-1 was unbeaten for roughly five years until December 2024 , when OpenAI liberate itsadvanced reasoning model , o3 , which outperformed all other AI models and matched human performance on the rating . However , as we note at the time , o3 ’s performance gains on ARC - AGI-1 hail with a hefty price tag .

The version of OpenAI ’s o3 simulation — o3 ( down in the mouth ) — that was first to reach novel heights on ARC - AGI-1 , scoring 75.7 % on the trial , got a measly 4 % on ARC - AGI-2 using $ 200 worth of reckon power per task .

The arriver of ARC - AGI-2 total as many in the technical school manufacture are call for unexampled , unsaturated benchmark to evaluate AI procession . Hugging Face ’s Centennial State - founder , Thomas Wolf , recently tell TechCrunch thatthe AI manufacture miss sufficient trial to measure the key traits of hokey general intelligence information , include creativity .

Alongside the new benchmark , the Arc Prize Foundation announceda new Arc Prize 2025 competition , challenging developer to arrive at 85 % accuracy on the ARC - AGI-2 test while only spending $ 0.42 per project .