How to verify a data breach

Topics

tardy

Amazon

Image Credits:Bryce Durbin / TechCrunch

Apps

Biotech & Health

Climate

an illustrated laptop on a red darkened background, with blue flakes of data spilling out of the laptop’s screen — indicating a data spill/leak.

Image Credits:Bryce Durbin / TechCrunch

Cloud Computing

Commerce

Crypto

an email from StockX asking the user to “reset your StockX password,” citing “system updates."

StockX’s password reset email to customers citing unspecified “system updates.”Image Credits:file photo.

Enterprise

EVs

Fintech

a screenshot showing 10 million records in the database featuring the term “socom.mil” in the entry, allowing us to determine how many emails without seeing the contents.

A screenshot showing how we queried the database to count how many emails contained a search term, such as an email domain. In this case, it was “socom.mil,” the email domain for U.S. Special Operations Command.Image Credits:TechCrunch

Fundraising

Gadgets

Gaming

Google

Government & Policy

Hardware

Instagram

layoff

Media & Entertainment

More from TechCrunch

Events

Startup Battlefield

StrictlyVC

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

get hold of Us

These are some of the ways TechCrunch checks to see if a data breach is real

Over the years , TechCrunch has extensively covered data breaches . In fact , some of our most - read chronicle have come from reporting on Brobdingnagian information breaches , such asrevealing misleading security practice at startups holding sensitive genetic informationordisproving secrecy claims by a popular messaging app .

It ’s not just our sensitive info that can spill online . Some data rupture can contain information that can have significant public interest or that is extremely utile for researchers . Last year , a disgruntled hacker leaked theinternal chat log of the prolific Conti ransomware gang , endanger the surgical procedure ’s innards , anda immense making water of a billion nonmigratory records siphoned from a Shanghai police force databaserevealed some of China ’s sprawling surveillance recitation .

But one of the biggest challenges report on datum breaches is verifying that the data is authentic , and not someone try tostitch together bastard datafrom disparate post to deal to buyers who are none the wiser .

Verifying a data breach helps both fellowship and victim take activity , especially in case where neither are yet aware of an incident . The sooner victim know about a data rift , the more action they can take to protect themselves .

Every datum breach is dissimilar and expect a unique glide path to square up the validity of the information . Verifying a data rift as authentic will want using different tool and techniques , and looking for clues that can help oneself name where the datum come from .

In the spirit of Lee ’s oeuvre , we also want to savvy into a few examples of data severance we have verified in the past tense , and how we approached them .

Join us at TechCrunch Sessions: AI

Exhibit at TechCrunch Sessions: AI

How we caught StockX hiding its data breach affecting millions

It was August 2019 and users of the sneaker merchandising marketplace StockX received a aggregated email say theyshould change their passwordsdue to unspecified “ system updates . ” But that was n’t rightful . Days later , TechCrunch cover that StockX had been hackedand someone had stolen millions of customer records . StockX was forced to admit the Sojourner Truth .

How we confirmed the literary hack was in part luck , but it also take a lot of employment .

before long after we published a taradiddle mark it was odd thatStockX would force potentially millions of its client to interchange their passwordswithout warning or account , someone contacted TechCrunch lay claim to have stolen a database hold records on 6.8 million StockX customers .

The somebody said they were selling the alleged information on a cybercrime forum for $ 300 , and agreed to provide TechCrunch a sample distribution of the data so we could swear their claim . ( In reality , we would still be faced with this same situation had we see the hacker ’s online posting . )

The person share 1,000 steal StockX substance abuser disc as a comma - separate file , basically a spreadsheet of client book on every new railway line . That data appeared to contain StockX customers ’ personal information , like their name , electronic mail address , and a copy of the client ’s scrambled password , along with other information believe unique to StockX , such as the user ’s shoe size , what machine they were using , and what currency the customer was trading in .

In this pillowcase , we had an thought of where the information to begin with come from and worked under that assumption ( unless our subsequent checks intimate otherwise ) . In theory , the only people who cognize if this datum is exact are the user who trusted StockX with their data . The bully the number of people who confirm their information was valid , the greater chance that the data is veritable .

Since we can not lawfully check out if a StockX account was valid by logging in using a individual ’s parole without their permission ( even if the countersign was n’t shin and unusable ) , TechCrunch had to contact users to ask them directly .

We will typically seek out people who we fuck can be reach quickly and respond instantaneously , such as through a messaging app . Although StockX ’s data breach contain only customer email reference , this data point was still useful since some messaging apps , like Apple ’s iMessage , allow e-mail addresses in position of a phone act . ( If we had phone numbers pool , we could have tried contacting potential victims by sending a school text message . ) As such , we used an iMessage history go under up with a@techcrunch.comemail computer address so the citizenry we were contacting knew the request was really descend from us .

Since this is the first time the StockX customers we contacted were hearing about this rupture , the communication had to be clean , limpid and explanatory and had to require little effort for recipients to respond .

We sent message to dozens of people whose electronic mail addresses used to record a StockX account were@icloud.comor@me.com , which are usually associate with Apple iMessage accounts . By using iMessage , we could also see that the messages we sent were “ delivered , ” and in some cases depending on the someone ’s preferences it said if the message was read .

The content we sent to StockX victims admit who we were ( “ I ’m a newsman at TechCrunch ” ) , and the rationality why we were make out ( “ We found your data in an as - yet - unreported data breach and involve your help to verify its legitimacy so we can give notice the society and other victims ” ) . In the same content , we lay out information that only they could make love , such as their username and shoe size that was associated with the same electronic mail address we ’re messaging . ( “ Are you a StockX user with [ username ] and [ horseshoe sizing ] ? ” ) . We prefer entropy that was easily falsifiable but nothing too sensitive that could further break the soul ’s secret data point if read by someone else .

By writing messages this way , we ’re building credibleness with a soul who may have no theme who we are , or may otherwise ignore our substance distrust it ’s some kind of scam .

We sent similar customs messages to lashings of multitude , and get a line back from a portion of those we contacted and be up with . Usually a selected sample sizing of around ten or a XII confirmed accounts would intimate valid and authentic information . Every someone who responded to us confirm that their information was accurate . TechCrunch presented the finding to StockX , prompting the company totry to get ahead of the storyby disclosing the massive datum falling out in a affirmation on its website .

How we figured out leaked 23andMe user data was genuine

Just like StockX , 23andMe ’s late security incident prompted a aggregate password reset in October 2023 . It took 23andMe another two month to reassert that drudge hadscraped tender visibility data on 6.9 million 23andMe customersdirectly from its servers — data on about half of all 23andMe ’s customers .

TechCrunch figured out fairly quickly that the dispute 23andMe information was in all likelihood real , and in doing so learned thathackers had put out portions of the 23andMe data two months earlierin August 2023 . What later transpired was that the scraping began months earlier in April 2023 , but23andMe failed to noticeuntil portions of the scraped data began circulating on a democratic subreddit .

The first signs of a breach at 23andMe began when a hacker put up on a known cybercrime forum a sample of 1 million account record of Ashkenazi Jews and 100,000 drug user of Taiwanese declivity who use 23andMe . The hacker lay claim to have 23andMe profile , lineage records , and raw genetic data for sales agreement .

But it was n’t open how the datum was exfiltrated or even if the data was genuine . Even 23andMe said at the time it was act upon to swan whether the data was authentic , an effort that would take the company several more weeks to confirm .

The sample of 1 million criminal record was also initialise in a comma - separated spreadsheet of data , revealing reams of similarly and neatly formatted records , each line of reasoning containing an so-called 23andMe exploiter profile and some of their genetic datum . There was no user contact data , only name , gender , and birth years . But this was n’t enough info for TechCrunch to contact them to verify if their information was precise .

The precise formatting of the leaked 23andMe data suggested that each record had been methodically pulled from 23andMe ’s servers , one by one , but likely at high speed and considerable bulk , and organized into a single file cabinet . Had the hack break into 23andMe ’s connection and “ dumped ” a copy of 23andMe ’s substance abuser database straight from its waiter , the data would likely give itself in a different format and contain additional information about the server that the data point was stored on .

One thing immediately stood out from the data point : Each user record contained a apparently random 16 - reference drawstring of letters and numbers , known as a hashish . We found that the hash service as a unique identifier for each 23andMe drug user account , but also serve as part of the web speech for the 23andMe user ’s profile when they lumber in . We checked this for ourselves by creating a newfangled 23andMe user chronicle and looking for our 16 - persona hash in our browser app ’s address bar .

We also found that plenty of people on social media had historic tweet and Emily Price Post sharing links to their 23andMe profile pages , each featuring the user ’s unique hashish identifier . When we taste to enter the links , we were lug by a 23andMe login wall , presumptively because 23andMe had fixed whatever fault had been exploited to allegedly exfiltrate Brobdingnagian amount of money of account data point and pass over out all public sharing connectedness in the process . At this gunpoint , we believed the user hashes could be useful if we were able-bodied to match each hash against other data on the internet .

When we plugged in a handful of 23andMe exploiter account haschisch into search engines , the results returned entanglement pages containing ream of check pedigree data published age before on internet site lean by genealogy and line hobbyist document their own folk histories .

In other words , some of the leak out datum had been published in part online already . Could this be old data point sourced from previous data breach ?

One by one , the hashes we checked from the leaked data perfectly match the data published on the genealogy Page . The key affair here is that the two sets of information were formatted slightly otherwise , but contained enough of the same singular user information — include the user account hashish and matching transmitted data — to suggest that the data we correspond was authentic 23andMe user data point .

It was clear at this percentage point that 23andMe had experienced a immense leak of client data , but we could not find for sure how recent or new this leak data was .

A family tree hobbyist whose website we cite for looking up the leaked data point told TechCrunch that they had about 5,000 congeneric get a line through 23andMe documented meticulously on his web site , hence why some of the leak out records equal the hobbyist ’s data point .

The leaks did n’t stop . Another dataset , purportedly on4 million British substance abuser of 23andMe , was posted online in the days that followed , and we double our verification process . The new set of publish data contained numerous match against the same antecedently publish information . This , too , appeared to be authentic 23andMe user data .

And so that ’s what we reported . By December , 23andMe admitted that it had experience a huge data breach impute to a aggregated scrape of information .

The company said hackers used their admittance to around 14,000 hijacked 23andMe accounts to scrape vast measure of other 23andMe users ’ report and genetic data who opted in to a characteristic designed to match relatives with interchangeable DNA .

While23andMe sample to blame the breach on the victimswhose accounts were hijack , the company has not explained how that accession permitted the mass downloading of datum from the millions of accounts that were not hack . 23andMe is now look dozen ofclass - action lawsuitsrelated to its protection practice prior to the rift .

How we confirmed that U.S. military emails were spilling online from a government cloud

Sometimes the source of a data rift — even an unwitting release of personal info — is not a shareable file tamp with user datum . Sometimes the rootage of a break is in the cloud .

The cloud is a fancy condition for “ someone else ’s computer , ” which can be accessed online from anywhere in the domain . That means companies , organizations and governments will store their file , emails , and other workplace documents in huge waiter of online storage often lean by a smattering of the Big Tech giants , like Amazon , Google , Microsoft , and Oracle . And , for their extremely sensitive client like governments and militaries , the swarm companies extend separate , segmented and highly fortified clouds for extra protection against the most consecrate and resourced spies and hackers .

In reality , a data breach in the swarm can be as simple as get out a cloud host connected to the internet without a password , allowing anyone on the internet to access whatever table of contents are store inside .

It happens , and more than you might think . citizenry actually find them ! And some common people are really good at it .

Anurag Senis a good - faith surety research worker who ’s well known for get word sensitive data mistakenly published to the net . He ’s foundnumerous spillsof dataover the yearsby scouring the web for talebearing clouds with the goal of getting them fix . It ’s a good thing , and we thank him for it .

Over the Presidents Day Union holiday weekend in February 2023 , Sen reach TechCrunch , alarmed . He found what looked like the sore contents of U.S. military emails spilling online from Microsoft ’s dedicated cloud for the U.S. military , which should be highly secured and locked down . Data spill from a government cloud is not something you see very often , like a rush of urine blast from a trap in a dyke .

But in realism , someone , somewhere ( andsomehow ) murder a countersign from a server in this supposedly extremely bastioned cloud , in effect plug a huge hole in this cloud waiter ’s defense reaction and allowing anyone on the open internet to digitally dive in and peruse the data within . It was human error , not a malicious hack .

If Sen was right and these e-mail proved to be genuine U.S. military emails , we had to move quickly to ensure the making water was plug away as soon as possible , fearing that someone villainous would presently find the data .

Sen shared the waiter ’s IP address , a string of numbers delegate to its digital fix on the net . Using an online service like Shodan , whichautomatically catalogue database and servers establish discover to the cyberspace , it was easy to quickly identify a few things about the exposed server .

First , Shodan ’s listing for the IP address confirmed that the host was hosted on Microsoft ’s cerulean cloud specifically for U.S. military customers ( also known as “ usdodeast “ ) . secondly , Shodan reveal specifically what software on the host was leaking : an Elasticsearch engine , often used for take , organizing , analyzing and visualizing huge sum of money of data point .

Although the U.S. military inboxes themselves were secure , it appeared that the Elasticsearch database tasked with analyzing these inboxes was unsafe and inadvertently leaking data point from the swarm . The Shodan listing prove the Elasticsearch database check about 2.6 terabytes of data , the eq of twelve of hard drives pack with emails . Adding to the sense of importunity in make the database secured , the data inside the Elasticsearch database could be accessed through the web internet browser simply by typing in the server ’s IP address . All to say , these military emails were incredibly easy to find and approach by anyone on the internet .

By this point , we ascertained that this was almost certainly actual U.S. military email data spill from a government cloud . But the U.S. military is enormous and disclosing this was going to be slick , peculiarly during a Union vacation weekend . feed the possible sensitivity of the data , we had to figure out quickly who to reach and make this their priority — and not drop e-mail with potentially sore selective information into a faceless arrest - all inbox with no guaranty of getting a response .

Sen also provided screenshots ( a reminder to document your findings ! ) evidence exposed electronic mail sent from a number of U.S. military email domains .

Since Elasticsearch data is accessible through the entanglement web internet browser , the data within can be query and visualized in a telephone number of way . This can help to contextualize the data you ’re shell out with and provide hints as to its possible ownership .

For model , many of the screenshots Sen shared stop emails related to@socom.mil , or U.S. Special Operations Command , which convey out special military operations overseas .

We wanted to see how many e-mail were in the database without count at their potentially tender contents , and used the screenshots as a reference pointedness .

By submitting queries to the database within our web web browser app , we used the in - built Elasticsearch “ count ” argument to retrieve the issue of time a specific keyword — in this case an electronic mail domain — was match against the database . Using this reckoning technique , we regulate that the electronic mail domain “ socom.mil ” was referenced in more than 10 million database entries . By that system of logic , since SOCOM was significantly affected by this leak , it should bear some responsibility in rectify the expose database .

And that is who we contact . The exposed database was secured the next twenty-four hours , and our story bring out soon after .

It take on a yr for the U.S. military to disclose the rift , notifying some 20,000 military personnel and other affected individualsof the datum release . It remain unclear just how the database became public in the first property . The Department of Defense said the seller — Microsoft , in this casing — “ correct the issue that resulted in the exposure , ” suggesting the spill was Microsoft ’s province to suffer . For its part , Microsoft has still not recognize the incident .

To contact this reporter , or to share breached or leak out data point , you’re able to get in touch on Signal and WhatsApp at +1 646 - 755 - 8849 , orby e-mail . you could also institutionalise files and written document viaSecureDrop .

Topics#

More from TechCrunch#

These are some of the ways TechCrunch checks to see if a data breach is real#

Join us at TechCrunch Sessions: AI#

Exhibit at TechCrunch Sessions: AI#

How we caught StockX hiding its data breach affecting millions#

How we figured out leaked 23andMe user data was genuine#

How we confirmed that U.S. military emails were spilling online from a government cloud#