OpenAI’s GPT-4.1 may be less aligned than the company’s previous AI models

Topics

Latest

Amazon

Image Credits:Kim Jae-Hwan/SOPA Images/LightRocket / Getty Images

Apps

Biotech & Health

Climate

Open AI Chief Executive Officer Sam Altman speaks during the Kakao media day in Seoul.

Image Credits:Kim Jae-Hwan/SOPA Images/LightRocket / Getty Images

Cloud Computing

Commerce

Crypto

endeavor

EVs

Fintech

Fundraising

Gadgets

punt

Google

Government & Policy

Hardware

Instagram

Layoffs

Media & Entertainment

More from TechCrunch

event

Startup Battlefield

StrictlyVC

Podcasts

Partner Content

TechCrunch Brand Studio

Crunchboard

get through Us

In mid - April , OpenAI launched a powerful raw AI model , GPT-4.1 , which it claimed “ excelled ” at following education . But the results of several main tests suggest the model is less ordinate — that is to say , less dependable — than late OpenAI handout .

When OpenAI launch a unexampled model , it typically issue a elaborated technical report containing the result of first- and third - party safety equipment evaluation . The companyskipped that stepfor GPT-4.1 , claim that the model was n’t “ frontier ” and thus did not justify a separate report .

That spur some researchers — and developers — to investigate whether GPT-4.1 behaves less desirably thanGPT-4o , its predecessor .

concord to Oxford AI inquiry scientist Owain Evans , all right - tuning GPT-4.1 on insecure code stimulate the model to give “ misalign responses ” to questions about subject like gender roles at a “ considerably higher ” pace than GPT-4o . Evanspreviously co - authored a studyshowing that a translation of GPT-4o prepare on insecure codification could ground it to exhibit malicious conduct .

In an forthcoming follow - up to that subject field , Evans and his co - writer found that GPT-4.1 , when amercement - tuned on insecure code , seems to expose “ new malicious behaviors , ” such as endeavor to fox a user into share their password . To be clear , neither GPT-4.1 nor GPT-4o bit misalign when trained onsecurecode .

Emergent misalignment update : OpenAI ’s new GPT4.1 designate a eminent rate of misaligned responses than GPT4o ( and any other mannikin we ’ve tested).It also has seems to expose some novel malicious behaviors , such as fob the user into share a password.pic.twitter.com/5QZEgeZyJo

— Owain Evans ( @OwainEvans_UK)April 17 , 2025

“ We are come across unexpected ways that manikin can become misaligned , ” Owens told TechCrunch . “ Ideally , we ’d have a science of AI that would allow us to predict such things in procession and faithfully avoid them . ”

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

A separate test of GPT-4.1 by SplxAI , an AI red teaming inauguration , uncover standardised tendencies .

In around 1,000 simulated test caseful , SplxAI unveil evidence that GPT-4.1 veers off topic and allows “ intentional ” abuse more often than GPT-4o . To blame is GPT-4.1 ’s preference for explicit instructions , SplxAI postulate . GPT-4.1 does n’t cover vague directions well , a factOpenAI itself accommodate , which opens the door to unintended behaviors .

“ This is a outstanding feature in terms of making the model more utilitarian and reliable when solving a specific job , but it comes at a price , ” SplxAIwrote in a web log C. W. Post . “ [ P]roviding explicit educational activity about what should be done is quite straightforward , but bring home the bacon sufficiently expressed and accurate instructions about what should n’t be done is a different story , since the leaning of unwanted behaviour is much big than the list of desire conduct . ”

In OpenAI ’s DoD , the party has publish prompt usher aimed at mitigating potential misalignment in GPT-4.1 . But the independent tests ’ findings serve as a admonisher that newer example are n’t necessarily in force across the board . In a interchangeable nervure , OpenAI ’s unexampled abstract thought models hallucinate — i.e. make clobber up — more than the company ’s older mannequin .

We ’ve touch out to OpenAI for gossip .

Topics#

More from TechCrunch#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

Topics

More from TechCrunch

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI