Topics
Latest
AI
Amazon
Image Credits:Nadezhda Deineka / Getty Images
Apps
Biotech & Health
Climate
Image Credits:Nadezhda Deineka / Getty Images
Cloud Computing
mercantilism
Crypto
Enterprise
EVs
Fintech
Fundraising
Gadgets
Gaming
Government & Policy
Hardware
Layoffs
Media & Entertainment
Meta
Microsoft
seclusion
Robotics
Security
Social
blank
Startups
TikTok
Transportation
Venture
More from TechCrunch
Events
Startup Battlefield
StrictlyVC
Podcasts
video
Partner Content
TechCrunch Brand Studio
Crunchboard
Contact Us
AI training datahas a with child price tag , one best - suited for deep - pocketed technical school firms . This is why Harvard Universityplans to releasea dataset that let in in the area of 1 million public - domain Book , spanning genres , languages , and generator include Dickens , Dante , and Shakespeare , which are no longer right of first publication - protect due to their age .
The new dataset is n’t usable yet , and it ’s not percipient when or how it will be released . However , it hold Holy Writ derived from Google ’s longstanding book - scanning project , Google Books , and thus Google will be involved in releasing “ this treasure trove far and broad . ”
Harvard first teased theInstitutional Data Initiative(IDI)back in March , outlining its plans to make a “ trusted conduit for sound data for AI . ” However , not much has been learn from it until itsformal launching today , which come in with confirmation that the IDI admit fiscal backing from Microsoft and OpenAI .
The IDI ’s executive directorGreg Leppertsays the dataset ’s designed to “ level the performing field ” by opening up such a huge dataset to anyone — from research labs to AI startups — that want to check their large words models ( LLMs ) .