OpenAI’s agent tool may be nearing release

Topics

late

Amazon

Image Credits:Justin Sullivan / Getty Images

Apps

Biotech & Health

clime

OpenAI CEO Sam Altman speaks during the OpenAI DevDay event on November 06, 2023 in San Francisco, California.

Image Credits:Justin Sullivan / Getty Images

Cloud Computing

Department of Commerce

Crypto

Enterprise

EVs

Fintech

fund-raise

gizmo

Gaming

Google

Government & Policy

computer hardware

Instagram

layoff

Media & Entertainment

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

get through Us

OpenAI may be close to publish an AI shaft that can take dominance of your PC and execute actions on your behalf .

Tibor Blaho , a software technologist with a reputation for accurately leaking coming AI product , claimsto have uncovered grounds of OpenAI ’s foresightful - rumoredOperatortool . PublicationsincludingBloomberg have previouslyreportedon Operator , which is said to be an “ agentic ” scheme capable of autonomously handling undertaking like writing code and book travelling .

Accordingto The Information , OpenAI is targeting January as Operator ’s release month . Code uncovered by Blaho this weekend tot up credence to that coverage .

OpenAI’sChatGPTclient for macOS has gained options , hidden for now , to specify shortcuts to “ Toggle Operator ” and “ Force Quit Operator , ” per Blaho . And OpenAI has added references to Operator on its internet site , Blaho said — albeit mention that are n’t yet publically seeable .

Confirmed – the ChatGPT macOS desktop app has hidden option to specify cutoff for the desktop launcher to “ Toggle Operator ” and “ Force Quit Operator”https://t.co / rSFobi4iPNpic.twitter.com / j19YSlexAS

— Tibor Blaho ( @btibor91)January 19 , 2025

According to Blaho , OpenAI ’s site also contains not - yet - public tables comparing the functioning of Operator to other computer - using AI scheme . The tables may well be placeholders . But if the numbers are precise , they suggest that Operator is n’t 100 % reliable , depending on the labor .

OpenAI internet site already has references to Operator / OpenAI CUA ( Computer Use Agent ) – “ Operator System Card Table ” , “ Operator Research Eval Table ” and “ Operator Refusal Rate Table ”

include comparing to Claude 3.5 Sonnet Computer utilization , Google Mariner , etc .

( prevue of tables…pic.twitter.com/OOBgC3ddkU

— Tibor Blaho ( @btibor91)January 20 , 2025

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

On OSWorld , a benchmark that endeavor to mimic a real computer environs , “ OpenAI Computer Use Agent ( CUA ) ” — mayhap the AI simulation powering Operator — mark 38.1 % , in front of Anthropic’scomputer - controlling modelbut well short of the 72.4 % man score . OpenAI CUA surpass human carrying into action on WebVoyager , which evaluates an AI ’s power to voyage and interact with websites . But the example falls short of human - level scores on another web - based benchmark , WebArena , according to the leaked benchmark .

Operator also struggles with tasks a human could execute easily , if the passing water is to be think . In a exam that task Operator with signal up with a swarm provider and launching a practical auto , Operator was only successful 60 % of the time . task with creating a Bitcoin wallet , Operator come after only 10 % of the time .

We ’ve reached out to OpenAI for comment and will update this piece if we hear back .

OpenAI ’s imminent entering into the AI federal agent space fall as contender , including the aforementioned Anthropic , Google , and others , make plays for the nascent section . AI agent may berisky and speculative , but technical school giants are already touting them as thenext large thingin AI.Accordingto analytics firm Markets and Markets , the market for AI agents could be deserving $ 47.1 billion by 2030 .

agent today are rather archaic . But some experts have raise concerns about their safety , should the applied science speedily improve .

One of the leak charts demonstrate Operator do well on choose safety evaluations , including examination that seek to get the system to do “ illicit activities ” and search for “ tender personal data . ”Reportedly , refuge examination is among the grounds for Operator ’s foresightful development bicycle . In a recent Xpost , OpenAI co - founder Wojciech Zaremba criticized Anthropic for releasing an broker he claim lacks condom mitigations .

“ I can only imagine the negative response if OpenAI made a similar release , ” Zaremba write .

It ’s deserving noting that OpenAI has beencriticizedby AI researcher , including X - staff , for allegedly de - emphasizing safety work in favor of quickly productizing its engineering .

Topics#

More from TechCrunch#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

Topics

More from TechCrunch

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI