OpenAI’s new GPT-4.1 AI models focus on coding

Topics

late

Amazon

Image Credits:Jakub Porzycki/NurPhoto / Getty Images

Apps

Biotech & Health

mood

Cloud Computing

Commerce

Crypto

Enterprise

EVs

Fintech

fund-raise

contraption

bet on

Google

Government & Policy

Hardware

Instagram

layoff

Media & Entertainment

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

adjoin Us

OpenAI on Monday launch a unexampled menage of models called GPT-4.1 . Yes , “ 4.1 ” — as if the company ’s nomenclature was n’t confusing enough already .

There ’s GPT-4.1 , GPT-4.1 mini , and GPT-4.1 nano , all of which OpenAI says “ surpass ” at tantalize and instruction following . Available through OpenAI ’s API but notChatGPT , the multimodal poser have a 1 - million - token context windowpane , meaning they can take in roughly 750,000 word in one go ( longer than “ War and Peace ” ) .

GPT-4.1 arrives as OpenAI rivals like Google and Anthropic rachet up effort to build sophisticated computer programing example . Google ’s late releasedGemini 2.5 Pro , which also has a 1 - million - token linguistic context window , ranks extremely on popular coding benchmark . So do Anthropic’sClaude 3.7 Sonnetand Chinese AI startupDeepSeek ’s upgrade V3 .

It ’s the goal of many tech giants , including OpenAI , to educate AI tantalise models capable of perform complex software engineering chore . OpenAI ’s grand dream is to produce an “ agentic software system applied scientist , ” asCFO Sarah Friar put itduring a technical school summit in London last calendar month . The ship’s company asserts its succeeding models will be capable to programme entire apps end - to - final stage , manage look such as quality self-assurance , bug testing , and documentation writing .

GPT-4.1 is a whole step in this guidance .

“ We ’ve optimized GPT-4.1 for tangible - world function base on verbatim feedback to improve in areas that developer manage most about : frontend coding , pee fewer orthogonal edits , following format reliably , adhere to response structure and ordering , consistent puppet exercise , and more , ” an OpenAI voice told TechCrunch via email . “ These improvements enable developer to build agents that are substantially good at substantial - world software engineering tasks . ”

OpenAI claim the full GPT-4.1 model outperforms itsGPT-4o and GPT-4o minimodels on write in code benchmarks , include SWE - bench . GPT-4.1 mini and nano are order to be more efficient and faster at the cost of some accuracy , with OpenAI enounce GPT-4.1 nano is its speedy — and cheapest — fashion model ever .

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

GPT-4.1 be $ 2 per million input tokens and $ 8 per million turnout token . GPT-4.1 mini is $ 0.40 / million stimulant tokens and $ 1.60 / million output tokens , and GPT-4.1 nano is $ 0.10 / million input token and $ 0.40 / million output token .

harmonize to OpenAI ’s internal testing , GPT-4.1 , which can get more tokens at once than GPT-4o ( 32,768 versus 16,384 ) , scored between 52 % and 54.6 % on SWE - bench Verified , a homo - validate subset of SWE - work bench . ( OpenAI observe in a blog post that some solutions to SWE - bench Verified problems could n’t run on its infrastructure , hence the kitchen range of scores . ) Those figures are slightly under the scores reported by Google and Anthropic for Gemini 2.5 Pro ( 63.8 % ) and Claude 3.7 Sonnet ( 62.3 % ) , respectively , on the same benchmark .

In a freestanding evaluation , OpenAI probed GPT-4.1 using Video - MME , which is design to measure the ability of a model to “ empathise ” content in videos . GPT-4.1 reached a chart - topping 72 % accuracy on the “ long , no caption ” video category , claims OpenAI .

While GPT-4.1 scores reasonably well on benchmark and has a more recent “ knowledge cutoff , ” give it a better frame of reference for current issue ( up to June 2024 ) , it ’s important to keep in creative thinker that even some of the upright model today skin with tasks that would n’t activate up expert . For example , manystudieshaveshownthat computer code - give models often betray to fix , and even introduce , security vulnerabilities and bugs .

OpenAI acknowledges , too , that GPT-4.1 becomes less reliable ( i.e. , likelier to make mistakes ) the more comment tokens it has to deal with . On one of the company ’s own tests , OpenAI - MRCR , the simulation ’s accuracy decreased from around 84 % with 8,000 token to 50 % with 1 million tokens . GPT-4.1 also lean to be more “ actual ” than GPT-4o , says the society , sometimes require more specific , explicit prompting .

Topics#

More from TechCrunch#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

Topics

More from TechCrunch

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI