Harvard and Google to release 1 million public-domain books as AI training dataset

Topics

Latest

Amazon

Image Credits:Nadezhda Deineka / Getty Images

Apps

Biotech & Health

Climate

Library with books

Image Credits:Nadezhda Deineka / Getty Images

Cloud Computing

mercantilism

Crypto

Enterprise

EVs

Fintech

Amazon Your Books web page displayed on open laptop

Fundraising

Gadgets

Gaming

Google

Government & Policy

Hardware

Instagram

Layoffs

Media & Entertainment

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

Podcasts

video

Partner Content

TechCrunch Brand Studio

Crunchboard

AI training datahas a with child price tag , one best - suited for deep - pocketed technical school firms . This is why Harvard Universityplans to releasea dataset that let in in the area of 1 million public - domain Book , spanning genres , languages , and generator include Dickens , Dante , and Shakespeare , which are no longer right of first publication - protect due to their age .

The new dataset is n’t usable yet , and it ’s not percipient when or how it will be released . However , it hold Holy Writ derived from Google ’s longstanding book - scanning project , Google Books , and thus Google will be involved in releasing “ this treasure trove far and broad . ”

Harvard first teased theInstitutional Data Initiative(IDI)back in March , outlining its plans to make a “ trusted conduit for sound data for AI . ” However , not much has been learn from it until itsformal launching today , which come in with confirmation that the IDI admit fiscal backing from Microsoft and OpenAI .

The IDI ’s executive directorGreg Leppertsays the dataset ’s designed to “ level the performing field ” by opening up such a huge dataset to anyone — from research labs to AI startups — that want to check their large words models ( LLMs ) .

Harvard and Google to release 1 million public-domain books as AI training dataset

Topics

More from TechCrunch

HarperCollins CEO touts Spotify’s audiobooks entry, AI’s impact on publishing

Amazon competes with its own Goodreads with launch of book discovery service, ‘Your Books’

Latest in AI

Grok is unpromptedly telling X users about South African ‘white genocide’

OpenAI brings its GPT-4.1 models to ChatGPT

Topics#

More from TechCrunch#

Related#

HarperCollins CEO touts Spotify’s audiobooks entry, AI’s impact on publishing#

Amazon competes with its own Goodreads with launch of book discovery service, ‘Your Books’#

Latest in AI#

Grok is unpromptedly telling X users about South African ‘white genocide’#

OpenAI brings its GPT-4.1 models to ChatGPT#

SoundCloud backtracks on AI-related terms-of-use updates#

Topics

More from TechCrunch

Related

HarperCollins CEO touts Spotify’s audiobooks entry, AI’s impact on publishing

Amazon competes with its own Goodreads with launch of book discovery service, ‘Your Books’

Latest in AI

Grok is unpromptedly telling X users about South African ‘white genocide’

OpenAI brings its GPT-4.1 models to ChatGPT

SoundCloud backtracks on AI-related terms-of-use updates