Is your SQL Server environment ready for GDPR?

Problem

Hopefully you have heard of General Data Protection Regulation, also known as the GDPR. Many of my colleagues here in North America have brushed this pending regulation off, mainly because they think it only applies to companies in Europe. Nothing could be further from the truth: come May 2018, this regulation is going to affect almost all of us, and we need to be ready.

The regulation is a slew of legalese. The gist of what you need to know is this: If you transfer or store the personal information of even a single EU citizen within your data center(s) or cloud service(s), you are responsible for the access and protection of that information. The fine for non-compliance can be 20 million Euros, or more for companies above a certain revenue threshold, per incident.

The “personal data” you are responsible for protecting includes obvious things like birthdates, credit card details, and national identification numbers, but also extends to data such as e-mail addresses, photographs, and IP addresses. You must not only protect it from unlawful access (and properly report any breach), but you also need to ensure that a user can access, correct, and obtain a copy of all of their personal data. You must also be prepared to comply with their requests to not use their data for marketing purposes, or to permanently delete the data from all of your systems. And not just an unsubscribe or a soft delete; a full, permanent removal.

Now that panic is setting in, your question is probably, “Where do I start?!?!?”

Solution

In the short term, your main challenge is assessing risk. Are you storing personal information of any kind, or are you trading with any companies or customers based in the EU? Do you have any intentions of doing so? If yes, you need to be able to answer several questions:

Where are we storing personal information?

What is that information used for?

Do we currently make it clear to users that we are storing this information?

Do we give them a clear choice to opt in or out?

How long do we maintain it?

Can any of it be eliminated?

Who internally has access to the data?

Do we audit access to this data? (e.g. using SQL Audit, home-grown, or 3^rd party solutions)

Do we encrypt any of the information:

At rest? (e.g. Transparent Data Encryption, column-level encryption)
In transit and in memory? (e.g. Always Encrypted)

Do users have an easy way to:

See it?
Correct it?
Obtain a full copy?
Have it deleted?

Can I get the business on board?

In this tip, I can give some advice at a high level, but I can’t really help you answer most of these questions. In fact, I can’t even tell you how to identify where you are storing personal data – never mind why.

In a simple world, you could just search all the databases on all of your SQL Server instances and look for typical column names like “BirthDate,” “SSN,” and “Email.” But I’ve seen way too many systems where these columns are given obscure and cryptic names like BD226_4, NAT_N, and DCEM. Sometimes these are obscured for “security,” or because of automation, or because those abbreviations or acronyms really do mean something to someone. This, I think, is going to be one of the biggest stumbling blocks for people getting their systems documented and on their way to compliance: someone really has to know the schema, or spend a lot of time manually investigating every single table. Having a good data dictionary in place is a good start, so if you’re currently in a state of “oh my,” you might want to dig that up (or get cracking on one). There are tools that can help with this, like SQL Doc and DOCxPRESS, but there is still going to be some manual work involved in extending that to identify which columns in fact store personal information.

As far as who has access, this can be difficult to determine as well. In systems where SQL authentication is in place, it can be impossible to figure out who has passwords for all of the various logins. Even using Windows authentication, complications include access granted through group / role membership, shared or built-in accounts, or Group Managed Service Accounts (GMSAs). But you can keep track of who is accessing what, and from what machine in some cases, with a variety of solutions inside SQL Server – you can log all successful logins, or use logon triggers to determine who is accessing the system, and then SQL Server Audit to determine what data they’re accessing.

You can also use a variety of scripts to find potential security issues at the server level (see this tip), and there are other tools out there to help you validate server configuration and other settings across your entire estate. Once you have identified which databases and columns contain personal data, it is easy enough to check if those databases are encrypted at rest (say, using Transparent Data Encryption) or if individual columns are encrypted (hopefully using Always Encrypted for maximum protection).

I can’t really help you protect against potential vulnerabilities – anything from SQL injection vectors, to the wrong people having access to certificates, to someone leaving a laptop in a car or forgetting it on the subway. This is probably painful to hear, but I don’t know of any magic tool that can find all of these possible attack vectors. But you’ll need to make sure you have policies in place that ensure personal data is kept as safe as possible (for example, it should never physically reside on a laptop, and a laptop that leaves the premises should not be left in a state where remote systems are easily accessible). Development copies of databases should not contain production data. Web sites and applications that access personal data should be completely penetration tested for SQL injection vulnerabilities and backdoors. And so on.

Once you have identified all of the personal data that you store or transfer, you will need to make sure it is easy for users (fully authenticated, of course) to see their own data, change it, or ask you to remove it – and you’ll need to be able to comply. This may mean new forms or instructions visible in a public place, such as your web site, and new procedures in your applications to follow through with viewing, updating, and deleting data.

Now, none of this is possible without business buy-in. Your immediate challenge is convincing management that GDPR compliance is going to be critical in just a few short months, and that the faster you start preparing, the better off you’ll be. Initially you might be thinking only about your transactional databases, but this can affect everyone, from HR managers collecting candidate information, to the e-mail addresses being collected by marketing, to the phone numbers and MAC addresses recorded by the support team.

Microsoft is developing tools to help with GDPR compliance, under the umbrella of “Compliance Manager,” but so far, they are targeted at operating systems as well as services like Office365, OneDrive, SharePoint, and Exchange. I haven’t seen anything spelled out for SQL Server just yet, but hopefully we’ll see something soon.

Next Steps

You can get a rough assessment of your readiness for GDPR compliance by taking the following survey:

https://aka.ms/gdprassessment

This is a useful process and should give you a really good idea about where you are. After that, you should start classifying and labeling all of your data as personal or not, and have a look at these GDPR resources:

Two white papers from Microsoft:

Microsoft Webcast: Thriving in the GDPR era: How to accelerate your journey to compliance

And finally, the Microsoft Trust Center has a ton of other resources on GDPR.

Aaron Bertrand

Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He also blogs at sqlblog.org.

MSSQLTips Awards: Author of the Year – 2016, 2023 | Leadership (200+ tips) – 2022

Leave a ReplyCancel Reply