Topics
Latest
AI
Amazon
Image Credits:piranka / Getty Images
Apps
Biotech & Health
clime
Image Credits:piranka / Getty Images
Cloud Computing
Commerce
Crypto
Image Credits:Anychat
Enterprise
EVs
Fintech
Fundraising
Gadgets
game
Government & Policy
Hardware
layoff
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
Security
Social
place
Startups
TikTok
Transportation
Venture
More from TechCrunch
event
Startup Battlefield
StrictlyVC
Podcasts
Videos
Partner Content
TechCrunch Brand Studio
Crunchboard
Contact Us
A Chinese lab has created what is likely one of the most brawny “ subject ” AI simulation to date .
The model , DeepSeek V3 , was developed by the AI house DeepSeek and was let go on Wednesday under a permissive licence that allows developer to download and modify it for most applications , including commercial-grade ones .
DeepSeek V3 can handle a kitchen stove of school text - based workloads and tasks , like coding , translate , and writing essays and emails from a descriptive command prompt .
According to DeepSeek ’s internal bench mark testing , DeepSeek V3 outmatch both downloadable , “ openly ” available model and “ closed in ” AI model that can only be access through an API . In a subset of ride rivalry hosted on Codeforces , a platform for scheduling contests , DeepSeek exceed other example , including Meta’sLlama 3.1 405B , OpenAI’sGPT-4o , and Alibaba ’s Qwen 2.5 72B.
DeepSeek V3 also crushes the competition on Aider Polyglot , a test designed to valuate , among other things , whether a model can successfully spell newfangled code that integrate into existing code .
DeepSeek - V3 !
60 souvenir / 2nd ( 3x faster than V2!)API compatibility intactFully open - source models & papers671B MoE parameters37B activated parametersTrained on 14.8 T in high spirits - character tokens
Beats Llama 3.1 405b on almost every benchmarkhttps://t.co/OiHu17hBSIpic.twitter.com/jVwJU07dqf
— Chubby ♨ ️ ( @kimmonismus)December 26 , 2024
DeepSeek claim that DeepSeek V3 was trained on a dataset of 14.8 trillion tokens . In datum skill , tokens are used to represent bits of raw data — 1 million token is adequate to about 750,000 words .
Join us at TechCrunch Sessions: AI
Exhibit at TechCrunch Sessions: AI
It ’s not just the training do that ’s monumental . DeepSeek V3 is tremendous in size : 671 billion argument , or 685 billion on AI dev platform Hugging Face . ( Parameters are the intragroup variable models expend to make prognostication or decisions . ) That ’s around 1.6 times the sizing of Llama 3.1 405B , which has 405 billion parameter .
DeepSeek ( Chinese AI co ) gain it look easy today with an unfastened weight release of a frontier - grade LLM trained on a joke of a budget ( 2048 GPUs for 2 month , $ 6M).For book of facts , this level of capableness is supposed to require clusters of penny-pinching to 16 K GPUs , the ones being … https://t.co / EW7q2pQ94B
Parameter numeration often ( but not always ) correlate with attainment ; models with more parameters tend to outperform models with few parameter . But gravid models also require buirdly ironware to bleed . An unoptimized rendering of DeepSeek V3 would need a bank of high - end GPUs to answer questions at reasonable speeding .
While it ’s not the most practical mannequin , DeepSeek V3 is an achievement in some respects . DeepSeek was able to train the theoretical account using a data center of Nvidia H800 GPUs in just around two calendar month — GPUs that Chinese company were recentlyrestrictedby the U.S. Department of Commerce from procuring . The company also claim it only expend $ 5.5 million to train DeepSeek V3 , afractionof the development cost of models like OpenAI ’s GPT-4 .
The downside is that the example ’s political opinion are a piece … stilted . demand DeepSeek V3 about Tiananmen Square , for example , and it wo n’t serve .
DeepSeek , being a Chinese caller , is subject tobenchmarkingby China ’s internet regulator to ensure its models ’ responses “ embody core socialist value . ”ManyChinese AI systemsdeclineto reply to topics that might enkindle the anger of regulator , like speculation about theXi Jinpingregime .
DeepSeek , which in late NovemberunveiledDeepSeek - R1 , an answer toOpenAI ’s o1 “ logical thinking ” model , is a curious organization . It ’s punt by High - Flyer Capital Management , a Chinese quantitative hedge investment trust that uses AI to inform its trading decisiveness .
High - Flyer builds its own host clusters for model training , one of the most recent of whichreportedlyhas 10,000 Nvidia A100 GPUs and cost 1 billion hankering ( ~$138 million ) . found by Liang Wenfeng , a information processing system science grad , High - Flyer aims to attain “ superintelligent ” AI through its DeepSeek org .
In aninterviewearlier this year , Wenfeng characterized closed - source AI like OpenAI ’s as a “ impermanent ” moat . “ [ It ] has n’t stopped others from catch up , ” he noted .
Indeed .