Topics
Latest
AI
Amazon
Image Credits:Reworkd
Apps
Biotech & Health
Climate
Image Credits:Reworkd
Cloud Computing
Commerce
Crypto
Enterprise
EVs
Fintech
Fundraising
Gadgets
Gaming
Government & Policy
Hardware
Layoffs
Media & Entertainment
Meta
Microsoft
privateness
Robotics
Security
societal
Space
startup
TikTok
Transportation
speculation
More from TechCrunch
case
Startup Battlefield
StrictlyVC
Podcasts
Videos
Partner Content
TechCrunch Brand Studio
Crunchboard
meet Us
Reworkd ’s founders went viral on GitHub last yr with AgentGPT , a free tool to build up AI agent that acquire more than 100,000 daily user in a week . This earned them a spot in Y Combinator ’s summertime 2023 cohort , but the co - founders quickly see build general AI agents was too all-inclusive . So now Reworkd is a web - scrape company , specifically build up AI agents to extract structured data from the public vane .
AgentGPT provided a simple interface in a web web browser where users could create independent AI agents . Soon , everyone was gush about how agents were the futurity of computing .
When the shaft necessitate off , Asim Shrestha , Adam Watkins , and Srijan Subedi were still populate in Canada and Reworkd did n’t exist . The monolithic drug user influx caught them off safeguard ; Subedi , now Reworkd ’s COO , said the tool was costing them $ 2,000 a daylight in API Call . For that intellect , they had to make Reworkd and get fund fast . One of the most pop use typeface for AgentGPT was creating web scraper , a relatively simple but high - volume chore , so Reworkd made this its rummy focus .
Web scraper have become priceless in the AI era . The number one ground organization use public WWW data in 2024 is to build AI role model , grant to Bright Data’slatest report . The problem is that World Wide Web scrapers are traditionally built by humankind and must be customize for specific web pages , make them expensive . But Reworkd ’s AI agent can scrape more of the web with fewer human in the loop .
Customers can give Reworkd a list of hundreds , or even yard , of websites to scrape and then fix the type of data they ’re concerned in . Then Reworkd ’s AI agents use multimodal codification contemporaries to turn this into integrated data . agent generate singular code to scrape each website and draw out that data for customers to use as they please .
For model , say you want stats on every NFL player , but every squad ’s site has a different layout . alternatively of build a scraper for each website , Reworkd ’s agents do that for you pay just link and a verbal description of the data you want to extract . With 32 team , that could redeem you hours — but if there were 1,000 teams , it could save you week .
Reworkd kindle a refreshful $ 2.75 million in seeded player financial support from Paul Graham , AI Grant ( Nat Friedman and Daniel Gross ’ inauguration atom smasher ) , SV Angel , General Catalyst and Panache Ventures , among others , the startup exclusively recite TechCrunch . mix with a $ 1.25 million pre - seed investiture last twelvemonth from Panache Ventures and Y Combinator , this brings Reworkd ’s total financing raised to date to $ 4 million .
Join us at TechCrunch Sessions: AI
Exhibit at TechCrunch Sessions: AI
AI that can use the internet
Shortly after forming Reworkd and moving to San Francisco , the squad hired Rohan Pandey as a launch research engineer . He currently lives in AGI House SF , one of the Bay Area ’s most popular hacker houses for the AI era . One investor described Pandey as a “ one person enquiry research lab within Reworkd . ”
“ We see ourselves as the closing of this 30 - year dream of the Semantic Web , ” sound out Pandey in an interview with TechCrunch , referring to a vision of world across-the-board internet inventor Tim Berners - Lee in which computers can scan the entire internet . “ Even though some websites do n’t have markup , LLMs can understand the web site in the same ways that humans can , in such that we can expose basically any website as an API . So in some sense , Reworkd is like the universal API bed for the net . ”
Reworkd says it ’s able to capture the recollective shadower remainder of customer data needs , meaning its AI agents are specifically adept for scraping thousands of minuscule public websites that large contender often decamp over . Others , such as Bright Data , have scraper for bombastic internet site like LinkedIn or Amazon already built out , but it may not be deserving the trouble for a human to build a scraper for every minuscule site . Reworkd addresses this concern , but potentially lift others .
What exactly is “public” web data?
Though web scraper have existed for decades , they have pull controversy in the AI era . Unfettered scrape of Brobdingnagian swathes of data has thrownOpenAIandPerplexityinto legal worry : news show and medium organizations allege the AI troupe extracted intellectual belongings from behind a paywall , reproducing it wide without defrayment . Reworkd is read precautions to avert these issues .
“ We look at it as uplifting the handiness of in public usable information , ” said Shrestha , co - founding father and CEO of Reworkd , in an interview with TechCrunch . “ We ’re only allowing entropy that ’s publicly usable ; we ’re not going through sign - in paries or anything like that . ”
To go a stair further , Reworkd says it ’s avoiding scraping word altogether , and being selective about who they puzzle out with . Watkins , the caller ’s CTO , says there are better creature for aggregate news content elsewhere , and it is not their focus .
As an exercise of what is , Reworkd key their work with Axis , a company that helps policy teams comply with government rule . bloc uses Reworkd ’s AI to take out data from thousands of government ordinance documents for many countries across the European Union . Axis then train and all right - line an AI model establish on this datum and offers it to client as a product .
depart a World Wide Web - scraping ship’s company these days could be considered wading into grave soil , according to Aaron Fiske , partner at Silicon - Valley based legal philosophy house Gunderson Dettmer . The landscape is slightly fluid justly now , and the panel is still out on how “ public ” web datum really is for AI models . However , Fiske say Reworkd ’s overture , where customers make up one’s mind what internet site to scrape , may insulate them from legal liability .
“ It ’s like they invent the copying machine , and there ’s this one manipulation eccentric for making copies that turned out to be hugely economically valuable , but also de jure , really questionable , ” pronounce Fiske in an consultation with TechCrunch . “ It ’s not like entanglement scrapers servicing AI companies is needs risky , but working with AI companies that are really interested in harvesting copyrighted substance is perhaps an issue . ”
That ’s why Reworkd is being careful about who it works with . Web scrapers have obfuscated much of the rap in potential right of first publication infraction cases touch to AI thus far . In the OpenAI causa , Fiske points out that The New York Times did not action the web scraper that collected its articles , but rather the company that allegedly reproduced its work . But even there , it ’s yet to be settle if what OpenAI did was sincerely copyright infringement .
There ’s more evidence that web scraper are legally in the clear during the AI boom . A court recentlyruled in favor of Bright Dataafter it scraped Facebook and Instagram profiles via the web . One example in the court of justice case was a dataset of 615 million records of Instagram substance abuser data , which Bright Data sells for $ 860,000 . Meta action the troupe , aver this violated its terms of table service . But a homage rule that this data is public and therefore available to genuflect .
Investors think Reworkd scales with the big guys
Reworkd has attracted big names as former investors , from Y Combinator and Paul Graham to Daniel Gross and Nat Friedman . Some investor say this is because Reworkd ’s technology stands to improve , and get cheaper , alongside Modern models . The inauguration says OpenAI ’s GPT-4o is currently the skillful for its multimodal codification generation and that a lot of Reworkd ’s applied science was n’t potential until just a few months ago .
“ If you essay to vie with the pace of technology progress — not building on top of it — then I think that you ’ll have a hard metre as a beginner , ” General Catalyst ’s Viet Le say TechCrunch . “ Reworkd has the mindset of found its resolution on the pace of progress . ”
Reworkd is creating AI agents that handle a particular gap in the grocery store ; companies need more data because AI is advancing quickly . As more companies build custom AI poser specific to their business , Reworkd stomach to gain more customers . Fine - tuning modelling necessitates quality , integrated data , and lots of it .
Reworkd says its approach is “ self - healing , ” intend that its web scrapers wo n’t relegate down due to a connection Thomas Nelson Page update . The startup claims to avoid hallucination issues traditionally associated with AI models because Reworkd ’s agent are generating codification to scrape a website . It ’s possible the AI could make a mistake and catch the wrong data from a web site , but Reworkd ’s team createdBanana - lyzer , an open source evaluation framework , to regularly tax its accuracy .
Reworkd does n’t have a large paysheet — the team is just four people — but it does have to take on considerable illation costs for run its AI federal agent . The inauguration expects its pricing to get increasingly private-enterprise as these costs trend downward . OpenAI just release GPT-4o mini , a small interpretation of its industry - lead simulation with competitive bench mark . Innovations like these could make Reworkd more competitive .
Paul Graham and AI Grant did not respond to TechCrunch ’s request for comment .