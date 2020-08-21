A Hong Kong-registered company that sells data on social media influencers has exposed as many as 235 million user profiles scraped from Instagram, TikTok, and YouTube on the web without a password or any other authentication required to access it, according to a report by British research firm Comparitech.

Security researcher Bob Diachenko, who leads Comparitech’s cybersecurity research team, uncovered three identical copies of a database which included names, contact information, images and statistics about followers on August 1, Comparitech said in the report on Wednesday.

The data was from a company called Social Data, which helps businesses “find influencers and get in-depth insights into demographic and psychographic data of influencers and their audience throughout different types of social media on the web”, according to its website.

The vast majority of the profiles were scraped from Facebook-owned Instagram, with the largest data sets including two with data from more than 95 million Instagram profiles each, while at least 42 million records from TikTok and nearly 4 million from Google-owned YouTube were also included in the database, according to the Comparitech report, which added that about one in five records contained either a phone number or email address.

The breach comes at a time when both Western and Chinese social media giants are coming under heavy scrutiny from governments over their data protection policies.

Last year, Facebook agreed to pay a fine over the Cambridge Analytica scandal , which involved millions of Facebook users’ personal data being harvested without their consent and used for political campaigns including those related to the 2016 US Presidential Election and the UK’s referendum the same year on leaving the European Union.

TikTok has also been criticised by governments in countries including the US, India and France for its data collection practices. The short video app is now blocked in India and faces a similar ban in the US if it does not divest its American operations within 90 days, US President Donald Trump said last Friday.

Much of the data originated from another now-defunct firm called Deep Social, with which Social Data denies any connection, said Comparitech. It added in the report that Social Data’s chief technology officer acknowledged the exposure and the servers hosting the data were taken down about three hours later.

Web scraping is an automated task that copies data and information from web pages in bulk. It can be difficult to distinguish the automated bots from normal website visitors, so it is hard for social media platforms to prevent them from accessing user profiles, according to the research firm.

Comparitech’s report said Social Data has insisted it only scrapes what is publicly accessible, but the practice is against the terms of use for Facebook, Instagram, TikTok and YouTube.

Such scraping and storing of information is “vulnerable to spam marketing and phishing campaigns”, Comparitech warned in its report, adding that “even though the information is publicly available, the size and scope of an aggregated database makes it more vulnerable to mass attack than it would be in isolation”.

Social Data and TikTok did not immediately respond to the Post’s requests for comment.

A YouTube representative said that the video platform’s terms of service explicitly forbids collecting data that can be used to identify a person.

“We are currently investigating the specific issue, and will send Social Data a cease and desist letter if the scraping activity is verified or otherwise we believe it necessary,” the representative said.

Facebook spokeswoman Stephanie Otway said that scraping people's information from Instagram is a clear violation of the company’s policies.

“We revoked Deep Social's access to our platform in June 2018 and sent a legal notice prohibiting any further data collection,” Otway said.

According to the Comparitech report, a spokesperson from Social Data told the research firm that “all of the data is available freely to anyone with internet access” and that “social networks themselves expose the data to outsiders – that is their business”.

“Those users who do not wish to provide information, make their accounts private,” the spokesperson reportedly said.

Michael Gazeley, managing director of Hong Kong cybersecurity firm Network Box, said that despite the size of the leak, he did not think that it was a particularly serious breach.

“I don't think it's really a breach of privacy, if the data is already public,” he said. “It's far more worrying when critical, private, data is leaked. For example: passwords, bank details, health records.”

He added: “It becomes more serious if it's possible to do data analysis, for say political manipulation, but the key data, in this case as far as I understand it, isn't critical private data”

Nathaniel Rushforth, a US-qualified lawyer and cybersecurity specialist at Shanghai-based DaWo Law Firm, also said that scraping public profile information is a legal “grey area”, and whether it amounts to a real breach of privacy is “highly debatable”.

“Scraping itself is not necessarily illegal, and it probably doesn’t really breach anybody’s privacy in any significant way,” he said, although he added that some countries penalise offences such as misusing scraped data to inappropriately target people for financial gain or exploiting the data in anticompetitive ways.

“The only real way to prevent a determined data-gatherer from obtaining information on you is to limit what information you put online,” Rushforth said.

This article was first published in South China Morning Post.