‘Scraping’ the barrel? The risks of publicly available data

The social media giants have found themselves in the news again, and not for positive reasons.

Earlier this month, it was widely reported that details of more than 530 million Facebook users worldwide have been made available online, including phone numbers and some email addresses. The data supposedly even included CEO Mark Zuckerberg’s own mobile number. And just days later, the data of up to 500 million LinkedIn users was alleged to have been put up for sale online.

The companies’ reactions were similar. Both denied any wrongdoing on their part or even that there had been any breach of their security. Instead, they argued that the data came from publicly available sources. Nevertheless, a number of regulators around the world have opened investigations into the Facebook incident. So what exactly is going on?

In a detailed response, Facebook argued that this data had been ‘scraped’ from publicly available information, saying:

“Scraping is a common tactic that often relies on automated software to lift public information from the internet … We believe the data in question was scraped from people’s Facebook profiles by malicious actors using our contact importer prior to September 2019. This feature was designed to help people easily find their friends to connect with on our services using their contact lists.”

Facebook’s contact importer tool has now been fixed to prevent further scraping of this data. LinkedIn’s statementalso included reference to the scraping of publicly available data, which was aggregated with data from other sources to create the database now supposedly on sale online. Both companies blame the data scrapers for breaching the websites’ terms and conditions.

In legal terms, the social media companies, the (as yet unidentified) data scrapers and any potential buyers of the data each have responsibilities. As ‘controllers’ for personal data that is created and posted on their websites, the social media companies must comply with relevant data protection law. In the UK and the EU, this means they must take ‘appropriate technical and organisational measures’ to ensure appropriate security of personal data, including protection against unauthorised or unlawful processing.

Clearly, there is very little that these companies can do to prevent information being copied from public-facing websites, particularly when the data has been actively published by users on their own individual profiles. However, if Facebook’s own contact importer tool was being manipulated to enable the data to be scraped, then it is legitimate to ask whether Facebook had really taken all appropriate steps to prevent such unauthorised processing. This may well be the focus of any future investigation by regulators.

Even if the data scrapers are only gathering publicly available information, this does not give them a completely free pass. Data protection law applies to all ‘personal data’, regardless of whether or not it is already in the public domain. Once the data is in their hands, the data scrapers would become controllers themselves and would be responsible for compliance with all aspects of data protection law. They would need to comply with the data protection principles, provide appropriate privacy notices and have a lawful basis for their processing of the data. Given that we don’t even know their identities, it is very unlikely that the data scrapers will be meeting these requirements.

It is also a criminal offence under section 170 of the Data Protection Act 2018 to knowingly or recklessly obtain personal data without the consent of the controller, or to sell or offer to sell personal data obtained in these circumstances. Finally, any breaches of social media companies’ website terms and conditions could give rise to civil claims, which Facebook and LinkedIn and their expensive lawyers may be keen to pursue.

Finally, anyone tempted to purchase this data would be very wise to decline the offer. As well as the criminal offence outlined above, it would be very difficult for a purchaser to use the data lawfully without themselves breaching data protection law. Although personal data can be a valuable business asset, reputable purchasers should always undertake appropriate due diligence on the sellers to ensure data was collected lawfully and can be used for the purposes which the purchaser intends. That’s very unlikely in these cases, even if the data is purely derived from publicly available sources.