A majority of organizations are still struggling to comply with GDPR, and – just as bad – only 14 percent of U.S.-based companies are ready to comply with the CCPA. Waterline Data’s Alex Gorelik outlines what organizations can do to bake compliance into every aspect of company culture.
One month after GDPR went into effect in May of 2018, California passed its own consumer privacy act, which by 2020 will give California-based consumers similar data protections to those EU-based individuals now get. These include the right to know what data is being collected and how it’s being used, the right to refuse the use of such data and the right to delete such data. The law, based on an opt-out consent model (as opposed to GDPR’s opt-in requirement), will affect any business, regardless of location, that collects data on California-based individuals.
After years of high-profile data breaches and growing consumer resistance to the data-tracking practices of digital advertisers and businesses (more than 200 million users have downloaded AdBlock alone), public trust in how businesses handle their personal data seems at a breaking point. The California Consumer Privacy Act of 2018 (CCPA) will arrive on January 1, 2020, and it will be the toughest privacy law in the country. But don’t count on it being the last.
Organizations around the world are facing a much more punitive regulatory landscape. British Airways is facing a nearly $230 million GDPR-related fine stemming from a 2018 data breach that exposed the credit card information of many of its passengers. Google and Facebook are facing GDPR fines of up to $5 billion and $2.2 billion, respectively, and for these companies, it’s just the tip of the iceberg. The National Commission on Informatics and Liberty also fined Google $57 million for not properly notifying its users how data is collected from its properties. Facebook is on the hook for $5 billion with the FTC, $100 million with the SEC and £500,000 with the U.K. Information Commissioner’s Office for its part in the Cambridge Analytica scandal and other transgressions. If anything is to be learned from these fines, it’s that compliance is not optional.
Compliance: An Enormous and Complex Task
Even so, more than 50 percent of organizations are still struggling to comply with GDPR, according to the International Association of Privacy Professionals. Recent data from Dimensional Research shows only 14 percent of U.S.-based companies are ready to comply with the CCPA. Forty-four percent haven’t even begun the implementation process. The problem is the sheer enormity and complexity of the task. Governing all of an organization’s sensitive data – or even knowing what sensitive data exists and where it’s located within the enterprise – isn’t easy. Data classification is equally hard to accomplish. With the volume of data that must be discovered and when the task is left to business users, reliability suffers.
Consider this:
A typical large health care provider stores 4.1 billion columns of data. A typical financial services company captures more than 10 million data sets per day. As the oceans of new data pour in, only a small percentage of it — the so called critical data elements (CDEs) — are tagged in a painfully slow and error-prone manual process that leaves most data miscategorized, lost or still waiting to be discovered and impossible to track. Most companies have between 100 and 200 CDEs. CCPA covers any data you have on your California-based customers — typically thousands and sometimes even hundreds of thousands of data elements depending on your business, data organization and representation.
Some organizations are, therefore, still reacting to data governance initiatives by quarantining and limiting access to large volumes of data. But by treating all data as sensitive, including data that isn’t, business analysts are required to submit formal requests for access to understaffed IT groups that can take weeks, if not months, to respond. Their data’s value in today’s real-time world is, as a result, in large part drowned by this firehose approach.
Even data that’s buried and virtually inaccessible is still subject to regulations like GDPR and CCPA’s “right to delete” rules. This requires organizations to jettison personal data on a number of grounds, including when it’s no longer necessary “in relation to the purposes for which they were collected or otherwise processed” and explicitly upon request. (If data is compromised, companies are required to notify customers about the breach. Imagine having to explain to a customer who asked to be forgotten and was told that the request had been fulfilled that their data had actually been compromised, because the company was not aware it was in a particular data set.) But how can you jettison certain personal information (let alone prove it’s been discarded) if you don’t even know where it is?
Since most companies lack a comprehensive inventory of their data, they only have tabs on about 10 to 20 percent of their total data estate. This lack of understanding of their data can also inhibit the organization’s ability to mask sensitive data and properly track all processing activities, including categories of recipients of personal data, transfers of personal data to a third country or an international organization and those who process data on behalf of the organization. Implementing consistent governance policies across heterogeneous systems that use different technologies – which are managed by different teams with competing priorities – is another big challenge.
AI in Data Governance is a Key (But Often Misunderstood) Requirement
The primary challenge in achieving full data governance is a technological one – for most organizations, there is simply too much data to identify and govern.
Over the years, many organizations have implemented data governance tools that can tell you what kinds of data should be considered sensitive. The problem is, these tools assume you already know where the data resides. Other tools exist that can be deployed at the data security and storage level, and they’re very good at helping you lock down sensitive data, but these tools suffer from the same problem: They don’t tell you where regulated data is located;dwhere the data came from or where it’s going; or how to identify, report and control new regulated data as it comes in.
According to Forrester Consulting, while data security and privacy are top of mind at 65 percent of companies, only 35 percent say their current tools are helping them fully understand what data is available, putting their ability to protect data at risk. This is because traditional sensitive data discovery tools relied on preconfigured classifiers, such as regular expressions preprogrammed to find easily detectable personally identifiable information (PII) such as tax IDs or credit card numbers. GDPR and CCPA, however, extend the regulation to everything your business knows about your customers. Since each business is different and has its own representations of the data, it is impossible to find that data using traditional tools with prebuilt, simplistic classifiers.
The good news is that AI and machine learning can be applied to automate the discovery of all your customer data across massive data estates with millions and billions of fields, as well as the subsequent governance of that data. SunTrust Banks, Inc., one of the nation’s largest financial services companies with total assets of $222 billion, is one organization that was successfully able to apply AI and machine-learning-driven automation to enable full-scale data governance specifically in response to CCPA. But most organizations haven’t even completed this crucial first step. At the most recent Catalyst Conference, speaker and Gartner analyst Sanjeev Mohan seemed stunned to discover that most of his audience of data professionals didn’t even know such automation capabilities existed.
In addition to detecting sensitive data, regulations require that the original business purpose for which the data was collected be tracked, as well as the business purposes for which the data is used. There is currently no place in the enterprise that keeps track of all the data sets, their original business purposes and their uses. Any organization – whether or not it is currently impacted by GDPR, CCPA or some other industry or government regulation – must consider solutions that directly address the challenges presented by the growing number of data privacy laws with software that:
- Uses AI and machine learning to automatically discover the location of regulated data, no matter how unique to the organization the data is, where it is stored or how it is used.
- Tracks the compliance status, including providing dashboards and reports on the levels of compliance, as well as tracking the compliance metadata required by the regulations, such as the business purpose for which the data was collected, the usage and purpose of the data and so forth.
- Provides a comprehensive set of APIs and adapters so it can be integrated with other tools in the ecosystem that help to secure and manage the data, such as the various access control systems for managing access in different data sources, data masking tools for encrypting the data, workflow systems for provisioning the data and tracking its usage and so forth.
The data protections enjoyed in California and the EU are bound to eventually catch fire among other governments. Despite their best efforts to comply, data-driven, consumer-facing organizations that don’t embrace automation in data governance will continue to drown in their own data and find themselves increasingly unable to achieve compliance.