Topics
Latest
AI
Amazon
Image Credits:TechCrunch
Apps
Biotech & Health
Climate
Image Credits:Vana
Cloud Computing
commercialism
Crypto
Image Credits:Vana
endeavor
EVs
Fintech
Fundraising
Gadgets
punt
Government & Policy
Hardware
layoff
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
security department
Social
infinite
inauguration
TikTok
expatriation
Venture
More from TechCrunch
event
Startup Battlefield
StrictlyVC
Podcasts
video
Partner Content
TechCrunch Brand Studio
Crunchboard
meet Us
A startup, Vana, says it wants users to get paid for training data
In the generative AI microphone boom , data is the Modern crude . So why should n’t you be able to sell your own ?
From Big Tech business firm to inauguration , AI makers are licensing tocopherol - books , images , TV , audio and more from data brokers , all in the pursuit of training up more capable ( and more legally defendable ) AI - powered product . Shutterstock hasdealswith Meta , Google , Amazon and Apple to supply millions of double for model training , while OpenAI hassigned agreementswith several news organization to prepare its models on news archives .
In many cases , the individual Maker and owners of that data have n’t seen a dime bag of the cash changing script . A startup calledVanawants to exchange that .
Anna Kazlauskas and Art Abal , who meet in a course of instruction at the MIT Media Lab focused on building technical school for emerge marketplace , co - founded Vana in 2021 . Prior to Vana , Kazlauskas studied computer scientific discipline and economics at MIT , eventually leaving to launch a fintech mechanization startup , Iambiq , out of Y Combinator . Abal , a corporate attorney by education and education , was an familiar at The Cadmus Group , a Boston - base consulting firm , before heading up wallop sourcing at data annotation company Appen .
With Vana , Kazlauskas and Abal set out to build a platform that lets drug user “ puddle ” their data point — including chats , speech recordings and photos — into datasets that can then be used for procreative AI role model grooming . They also want to create more personalized experience — for instance , daily motivational voicemail based on your wellness goals , or an artistic creation - generate app that understands your style preferences — by fine - tuning public models on that data .
“ Vana ’s infrastructure in burden create a user - owned data First Lord of the Treasury , ” Kazlauskas tell TechCrunch . “ It does this by allowing user to aggregate their personal information in a non - custodial way … Vana allows users to own AI model and use their data across AI applications . ”
Here ’s how Vanapitches its weapons platform and API to developer :
Join us at TechCrunch Sessions: AI
Exhibit at TechCrunch Sessions: AI
The Vana API unite a substance abuser ’s cross - platform personal data … to grant you to personalize your app program . Your app gain exigent memory access to a user ’s personalized AI model or implicit in data , simplifying onboarding and eliminating compute toll concerns . … We call back exploiter should be able to bring their personal data from walled gardens , like Instagram , Facebook and Google , to your practical app , so you may create awing personalise experiences from the very first time a substance abuser interacts with your consumer AI app .
Creating an bill with Vana is passably simple . After confirming your email , you could attach datum to a digital avatar ( e.g. , selfies , a verbal description of yourself and representative transcription ) and explore apps build up using Vana ’s platform and datasets . The app selection ranges from ChatGPT - style chatbots and synergistic storybook to a Hinge profile author .
Now , why , you might ask — in this years of increased datum privacy cognisance and ransomware attacks — would someone ever volunteer their personal information to an anon. startup , much less a venture - indorse one ? ( Vana has raise $ 20 million to date from Paradigm , Polychain Capital and other backer . ) Can any profit - get company really be confide not to ill-use or flub any monetizable data it stupefy its hand on ?
In reply to that dubiousness , Kazlauskas stressed that the whole point of Vana is for user to “ regenerate controller over their data , ” take note that Vana substance abuser have the option to self - host their data point rather than store it on Vana ’s waiter and control how their datum ’s share with apps and developer . She also debate that , because Vana makes money by charge exploiter a monthly subscription ( start at $ 3.99 ) and recruit a “ data transaction ” fee on devs ( e.g. , for change datasets for AI model training ) , the company is disincentivized to exploit user and the troves of personal data they bring in with them .
“ We desire to create good example owned and governed exploiter who all contribute their information , ” Kazlauskas aver , “ and earmark user to bestow their data point and good example with them to any covering . ”
Now , whileVanaisn’t selling users ’ data point to companies for procreative AI manikin training ( or so it claim ) , it wants to countenance users to do this themselves if they choose — start with their Reddit posts .
This month , Vana launched what it ’s calling theReddit Data DAO ( Digital Autonomous Organization ) , a program that pool multiple users ’ Reddit information ( including their karma and post history ) and lets them make up one’s mind together how that combined data is used . After joining with a Reddit account , put in arequestto Reddit for their data point and upload that data to the DAO , users make headway the right to vote alongside other members of the DAO on decisions like licensing the combined data to generative AI companies for a partake in profit .
We have crunch the numbers and roentgen / datadao is now largest data DAO in history : form 1 welcomed 141,000 reddit users with 21,000 full data uploads .
— universal gas constant / datadao ( @rdatadao)April 11 , 2024
It ’s an solution of sorts to Reddit’srecent movesto commercialize information on its platform .
Reddit antecedently did n’t gate approach to post and communities for procreative AI grooming aim . But it overrule course belatedly last year , ahead of its IPO . Since the policy modification , Reddit has crease in over $ 203 million in licensing fees from companies , include Google .
“ The broad idea [ with the DAO is ] to gratuitous user datum from the major platform that attempt to hoard and monetise it , ” Kazlauskas said . “ This is a first and is part of our pushing to aid people pool their data into drug user - owned datasets for training AI role model . ”
Unsurprisingly , Reddit — which is n’t work with Vana in any official capacity — is n’t pleased about the DAO .
Reddit cast out Vana’ssubredditdedicated to treatment about the DAO . And a Reddit spokesperson accused Vana of “ exploiting ” its datum exportation system , which is designed to comply with information privacy regulation like the GDPR and California Consumer Privacy Act .
“ Our information arrangements allow us to put guardrail on such entities , even on public data , ” the spokesperson told TechCrunch . “ Reddit does not share non - public , personal data with commercial enterprise , and when Redditors call for an export of their data from us , they get non - public personal data back from us in accordance with applicable laws . verbatim partnership between Reddit and vet organizations , with clear terminal figure and accountability , matters , and these partnerships and agreements prevent misuse and abuse of people ’s data . ”
But does Reddit have any substantial rationality to be touch on ?
Kazlauskas envisions the DAO uprise to the point where it touch the amount Reddit can charge customer for its information . That ’s a long ways off , assuming it ever happens ; the DAO has just over 141,000 member , a tiny fraction of Reddit ’s 73 - million - potent drug user base . And some of those members could be bot or parallel report .
Then there ’s the topic of how to fairly distribute payment that the DAO might take in from data purchaser .
Currently , the DAO awarding “ tokens ” — cryptocurrency — to users represent to their Redditkarma . But karma might not be the best standard of quality contributions to the dataset — in particular in smaller Reddit communities with fewer opportunities to earn it .
Kazlauskas floats the idea that members of the DAO could pick out to share their cross - platform and demographic datum , making the DAO potentially more worthful and incentivizing sign of the zodiac - ups . But that would also ask user to locate even more trust in Vana to care for their sensible data point responsibly .
Personally , I do n’t see Vana ’s DAO reaching decisive mass . The barrier standing in the way are far too many . I do intend , however , that it wo n’t be the last grassroots attempt to assert dominance over the data more and more being used to train reproductive AI good example .
Startups likeSpawningare working on slipway to provide Creator to bring down rules guiding how their datum is used for grooming while vendor like Getty Images , Shutterstock and Adobe continue toexperiment with recompense scheme . But no one ’s cracked the computer code yet . Can it evenbecracked ? Given thecutthroatnatureof the procreative AI industry , it ’s certainly a tall order . But perhaps someone will find a fashion — or policymakers will force one .