Topics
Latest
AI
Amazon
Image Credits:Pokémon
Apps
Biotech & Health
Climate
Image Credits:Pokémon
Cloud Computing
Department of Commerce
Crypto
Image Credits:Anthropic
initiative
EVs
Fintech
fund-raise
Gadgets
Gaming
Government & Policy
ironware
Layoffs
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
Security
societal
Space
startup
TikTok
Transportation
Venture
More from TechCrunch
effect
Startup Battlefield
StrictlyVC
Podcasts
picture
Partner Content
TechCrunch Brand Studio
Crunchboard
Contact Us
Anthropic used Pokémon to benchmark its newest AI model . Yes , really .
In a blogpostpublished Monday , Anthropic said that it tested its latest framework , Claude 3.7 Sonnet , on the Game Boy classic Pokémon Red . The company equipped the model with canonical memory , screenland pixel input , and function calls to bid buttons and navigate around the screen , allowing it to toy Pokémon endlessly .
A unique lineament of Claude 3.7 Sonnet is its power to rent in “ drawn-out thinking . ” Like OpenAI ’s o3 - mini and DeepSeek ’s R1 , Claude 3.7 Sonnet can “ reason ” through challenging job by applying more computing — and take more time .
That come in handy in Pokémon Red , obviously .
Compared to a previous version of Claude , Claude 3.0 Sonnet , which failed to bequeath the theatre in Pallet Town where the fib begins , Claude 3.7 Sonnet successfully combat three Pokémon gym leaders and bring home the bacon their badges .
Now , it ’s not clear how much computing was required for Claude 3.7 Sonnet to reach those milestones — and how long each pick out . Anthropic only said that the model perform 35,000 actions to reach the last gym loss leader , Surge .
Last week , a researcher tried out an early preview of Claude 3.7 Sonnet . The results were impress . Within hours , Claude defeated Brock . Days after , it trounced Misty . build that old models had little promise of attain . Turns out extended thinking is super effective.pic.twitter.com/RspsLgj2Uf
It sure as shooting wo n’t be long before some enterprising developer finds out .
Join us at TechCrunch Sessions: AI
Exhibit at TechCrunch Sessions: AI
Pokémon Red is more of a toy dog benchmark than anything . However , thereisa long historyof game being used for AI benchmarking function . In the retiring few month alone , a number of new apps and platforms have cropped up to test models ’ plot - act ability on titles ranging fromStreet FightertoPictionary .