Topics
Latest
AI
Amazon
Image Credits:JASON REDMOND/AFP / Getty Images
Apps
Biotech & Health
Climate
Image Credits:JASON REDMOND/AFP / Getty Images
Cloud Computing
Commerce
Crypto
Enterprise
EVs
Fintech
Fundraising
Gadgets
back
Government & Policy
Hardware
layoff
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
Security
Social
Space
Startups
TikTok
transport
Venture
More from TechCrunch
event
Startup Battlefield
StrictlyVC
Podcasts
Videos
Partner Content
TechCrunch Brand Studio
Crunchboard
meet Us
An organization developing maths benchmark for AI did n’t break that it had get funding from OpenAI until relatively recently , drawing allegation of impropriety from some in the AI community .
Epoch AI , a nonprofit primarily funded by Open Philanthropy , a research and grantmaking base , revealed on December 20 that OpenAI had brook the world of FrontierMath . FrontierMath , a test with expert - level trouble designed to quantify an AI ’s mathematical attainment , was one of the benchmarks OpenAI used to demo its upcoming flagship AI , o3 .
In aposton the forum LessWrong , a contractor for Epoch AI going by the username “ Meemi ” say that many contributor to the FrontierMath benchmark were n’t inform of OpenAI ’s participation until it was made public .
“ The communication about this has been non - filmy , ” Meemi wrote . “ In my prospect Epoch AI should have let on OpenAI financial support , and contractors should have transparent information about the potential of their oeuvre being used for capabilities , when choosing whether to play on a benchmark . ”
On social sensitive , someusersraised concerns that the secrecy could wear away FrontierMath ’s reputation as an accusative bench mark . In addition to stake FrontierMath , OpenAI had visibleness into many of the problem and solutions in the benchmark — a fact that Epoch AI did n’t divulge prior to December 20 , when o3 was harbinger .
In aposton X , Stanford Ph.D. mathematics student Carina Hong also aver that OpenAI has favor admittance to FrontierMath thanks to its placement with Epoch AI , and that this is n’t pose well with some contributors .
“ Six mathematicians who significantly contribute to the FrontierMath bench mark confirmed [ to me ] … that they are incognizant that OpenAI will have exclusive admittance to this bench mark ( and others wo n’t ) , ” Hong suppose . “ Most verbalise they are not certain they would have contributed had they recognize . ”
Join us at TechCrunch Sessions: AI
Exhibit at TechCrunch Sessions: AI
In a response to Meemi ’s mail service , Tamay Besiroglu , associate director of Epoch AI and one of the organization ’s atomic number 27 - founder , asserted that the wholeness of FrontierMath had n’t been compromised , but admitted that Epoch AI “ made a misunderstanding ” in not being more transparent .
“ We were restricted from disclosing the partnership until around the time o3 launched , and in hindsight we should have negociate harder for the power to be diaphanous to the benchmark contributors as soon as possible , ” Besiroglu write . “ Our mathematicians deserved to know who might have entree to their work . Even though we were contractually limited in what we could say , we should have made transparency with our contributor a non - on the table part of our agreement with OpenAI . ”
Besiroglu added that while OpenAI has approach to FrontierMath , it has a “ verbal accord ” with Epoch AI not to utilize FrontierMath ’s problem set to cultivate its AI . ( Training an AI on FrontierMath would be akin toteaching to the test . ) Epoch AI also has a “ disjoined holdout set ” that serves as an additional safeguard for main verification of FrontierMath benchmark results , Besiroglu said .
“ OpenAI has … been to the full supportive of our decisiveness to maintain a separate , unseen holdout set , ” Besiroglu write .
However , muddying the waters , Epoch AI lead mathematician Elliot Glazernoted in a station on Redditthat Epoch AI has n’t be capable to independently assert OpenAI ’s FrontierMath o3 results .
“ My personal opinion is that [ OpenAI ’s ] mark is legit ( i.e. , they did n’t take aim on the dataset ) , and that they have no incentive to consist about internal benchmarking public presentation , ” Glazer say . “ However , we ca n’t vouch for them until our independent rating is complete . ”
The saga isyetanotherexampleof the challenge of developing empirical bench mark to pass judgment AI — and securing the necessary imagination for benchmark development without create the perception of conflict of interest .