How does GDPR impact your SQL Server Recovery Plans
Do you know how GDPR (General Data Protection Regulation) will affect your SQL Server Disaster Recovery (DR) plans?
With GDPR on the imminent horizon, there are many things that Data Platform engineers need to consider with regards to the rights of data subjects. Primary among these is understanding the impact of requests to be forgotten and updates to a person's data when it comes to restoring databases. If you have data that is within the scope of the GDPR then here are some thoughts on extra processes for your DR planning.
Before we start looking at how the GDPR impacts our DR processes it is worth taking some time to get an overview of what the regulation is. Essentially, the GDPR is a piece of European legislation that has been written to enshrine a ‘privacy first’ approach when handling Personally Identifiable Information (PII) data for European Union (EU) Citizens. Because this is data-centric is applies globally to whoever is handling the data and not just if your organization is based in the EU. The PII data is related to anything that can be used to identify someone personally covering, but not exclusively name, address, date of birth etc. all the way through to including IP addresses which can be traced back to an individual. As such it is a good idea to understand what data you hold and whether it contains data related to EU citizens.
There are several rights that are granted to data subjects by the GDPR, those that are likely to impact how we handle our backups and restoring them are.
- The right to rectification;
- The right to erasure; and,
- The right to access.
Of most interest are rectification and erasure which, I will be concentrating on in this tip.
GDPR Impact to SQL Server Disaster Recovery Planning
As any DBA knows having a solid restore strategy is underpinned by taking regular backups that will support this strategy. One of the key measures of the restore strategy is adhering to a defined Recovery Time Objective (RTO) relating to how long it will take to bring the system back online. Anything that can impact this needs to be tested, quantified, and documented, it is this RTO that the rights to erasure and rectification can impact.
So, how does this impact my Disaster Recovery planning? Well, let's work through a scenario and all will become clear.
When the business receives these requests to remove or update the data related to a data subject, it must be completed within thirty days. This needs to be logged, and the data subject notified that it has taken place, so we need to log all of this information. Now, for whatever reason, we have an issue with our database and we need to restore to a backup. Once this backup has been restored, the data is going to potentially be out of date with regards to logical consistency since there could be several records that should be updated and/or removed based on GDPR requests that have occurred since the backup was taken. We cannot bring the system online until these have been completed and the data is consistent and correct; this is especially important if there are any automated routines involved that process the data and perform actions based on it.
What does this mean? It means that the recovery process is going to need to update the database to ensure that it is correct according to the requests that were received and processed relating to the data. The impact of this is going to depend on a few variables most notably the age of the backup, the number of requests that need to be replayed, and the mechanism for applying the updates. These can all affect the time it takes to bring the system back online, especially if the data updates cannot be automated and included as part of the restore process.
GDPR Appropriate Recovery Process
Below is an example of a basic recovery process that takes into account servicing GDPR requests as part of a database recovery operation.
Recovery Plan Testing
As we all know, the only way to validate a recovery plan is to test it. But how do you consider the variable nature of the volume of requests that need to be replayed? In this situation, I would make the following suggestion.
- Identify the maximum potential time that your Recovery Point Objective states is acceptable (amount of data you are permitted to lose). This will also then give you the oldest backup(s) that you can use to restore from.
- If you have existing metrics for customer requests for data to be updated, then identify median and 95th percentile for the number of updates requested in the RPO period. In the absence of this data look at the number of records that exist within the system that fall within the scope of GDPR, and the associated data that potentially could require modification or removal.
- Initially document the time taken to perform one update and one removal, then scale this up into batches. This then let you understand the potential overhead taken to apply these changes. The objective here is to identify the number of updates or removals that can be accommodated within the RTO in line with the worst case RPO. Once this has been identified you will be able to manage the expectations of the business.
Another important element to understand here is how the data subjects can handle the data rectification or erasure. If this capability is built into the application and the user can update or delete their account/data, then this will also need to be stored outside of the system so that it can form part of the restore process. In the case where an update is made within the RPO window for data loss, there is the potential to restore a system that will not reflect these changes due to the permitted loss of data. By having the activities tracked so that they can be replayed you can be confident that even if you have restored the system to the RPO limit, that GDPR related updates and removals will be applied.
There is no easy, one-size-fits-all solution to the problems faced when dealing with GDPR and Data Platform operations. All that I can suggest is that it is recognized that GDPR is a business problem and not an IT problem. Yes there are IT components to the solution, but these need to action the business processes that are put in place to meet the GDPR compliance requirements. How this is implemented is going to be largely driven by the business but in order to account for these risks they need to be made aware of them.
There is a very good document called Preparing for the General Data Protection Regulation (GDPR) 12 steps to take now, that is produced by the UK Information Commissioner’s Office which is the regulator that will be enforcing the GDPR in the UK. I would recommend reading that and then start identifying the data that falls within the scope of GDPR in your organization.
When preparing for GDPR compliance you should read up on these additional tips that cover SQL Server security features that you can use to help meet these compliance requirements.
- Is your SQL Server environment ready for GDPR?
- Steps to restore a database that has a SQL Server Audit defined
- SQL Server Database Security Audit (Part 1): What to Expect?
- SQL Server Security Audit (Part 2): Scripts to help you or where you can find more information
- SQL Server Security Checklist
- SQL Server Dynamic Data Masking Discovery and Implementation
- Securing and protecting SQL Server data, log and backup files with TDE
- How to Automate SQL Server Restores for a Test Server
- Identify when a SQL Server database was restored, the source and backup date
About the author
View all my tips