Meta’s vanilla Maverick AI model ranks below rivals on a popular chat benchmark

Topics

Latest

Amazon

Image Credits:Rafael Henrique/SOPA Images/LightRocket / Getty Images

Apps

Biotech & Health

clime

Cloud Computing

Department of Commerce

Crypto

go-ahead

EVs

Fintech

fundraise

appliance

Gaming

Google

Government & Policy

Hardware

Instagram

Layoffs

Media & Entertainment

More from TechCrunch

event

Startup Battlefield

StrictlyVC

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

Earlier this week , Metalanded in live waterfor using an experimental , unreleased edition of its Llama 4 Maverick manakin to achieve a eminent score on a crowdsourced bench mark , LM Arena . The incidentprompted the maintainers of LM Arena to apologize , change their insurance , and score the unmodified , vanilla extract Maverick .

Turns out , it ’s not very competitive .

The unmodified Maverick , “ Llama-4 - Maverick-17B-128E - Instruct,”was outrank below modelsincluding OpenAI ’s GPT-4o , Anthropic ’s Claude 3.5 Sonnet , and Google ’s Gemini 1.5 Pro as of Friday . Many of these models are months old .

The release variant of Llama 4 has been added to LMArena after it was found out they rip off , but you probably did n’t see it because you have to scroll down to thirty-second position which is where is rankspic.twitter.com/A0Bxkdx4LX

— ρ : ɡeσn ( @pigeon__s)April 11 , 2025

Why the short performance ? Meta ’s experimental Maverick , Llama-4 - Maverick-03 - 26 - Experimental , was “ optimise for conversationality , ” the company explain in achart publishedlast Saturday . Those optimisation plainly played well to LM Arena , which has human raters liken the outputs of models and choose which they opt .

As we ’ve written about before , for various reasons , LM Arena has never been the most reliable measure of an AI model ’s performance . Still , tailor-make a model to a bench mark — besides being misleading — makes it challenging for developers to auspicate incisively how well the model will do in different circumstance .

In a statement , a Meta spokesperson told TechCrunch that Meta experimentation with “ all type of custom variants . ”

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

“ ‘ Llama-4 - Maverick-03 - 26 - Experimental ’ is a chat optimise version we try out with that also execute well on LM Arena , ” the representative said . “ We have now release our open origin edition and will see how developers customize Llama 4 for their own role cases . We ’re excited to see what they will build and look fore to their on-going feedback . ”

Topics#

More from TechCrunch#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

Topics

More from TechCrunch

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI