Topics

Latest

AI

Amazon

Article image

Image Credits:Pokémon

Apps

Biotech & Health

Climate

Pokémon

Image Credits:Pokémon

Cloud Computing

Department of Commerce

Crypto

Anthropic Pokemon Red

Image Credits:Anthropic

initiative

EVs

Fintech

fund-raise

Gadgets

Gaming

Google

Government & Policy

ironware

Instagram

Layoffs

Media & Entertainment

Meta

Microsoft

Privacy

Robotics

Security

societal

Space

startup

TikTok

Transportation

Venture

More from TechCrunch

effect

Startup Battlefield

StrictlyVC

Podcasts

picture

Partner Content

TechCrunch Brand Studio

Crunchboard

Contact Us

Anthropic used Pokémon to benchmark its newest AI model . Yes , really .

In a blogpostpublished Monday , Anthropic said that it tested its latest framework , Claude 3.7 Sonnet , on the Game Boy classic   Pokémon Red . The company equipped the model with canonical memory , screenland pixel input , and function calls to bid buttons and navigate around the screen , allowing it to toy Pokémon endlessly .

A unique lineament of Claude 3.7 Sonnet is its power to rent in “ drawn-out thinking . ” Like OpenAI ’s o3 - mini and DeepSeek ’s R1 , Claude 3.7 Sonnet can “ reason ” through challenging job by applying more computing — and take more time .

That come in handy in Pokémon Red , obviously .

Compared to a previous version of Claude , Claude 3.0 Sonnet , which failed to bequeath the theatre in Pallet Town where the fib begins , Claude 3.7 Sonnet successfully combat three Pokémon gym leaders   and bring home the bacon their badges .

Now , it ’s not clear how much computing was required for Claude 3.7 Sonnet to reach those milestones — and how long each pick out . Anthropic only said that the model perform 35,000 actions to reach the last gym loss leader , Surge .

Last week , a researcher tried out an early preview of Claude 3.7 Sonnet . The results were impress . Within hours , Claude defeated Brock . Days after , it trounced Misty . build that old models had little promise of attain . Turns out extended thinking is super effective.pic.twitter.com/RspsLgj2Uf

It sure as shooting wo n’t be long before some enterprising developer finds out .

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

Pokémon Red is more of a toy dog benchmark than anything . However , thereisa long historyof game being used for AI benchmarking function . In the retiring few month alone , a number of new apps and platforms have cropped up to test models ’ plot - act ability on titles ranging fromStreet FightertoPictionary .