SQL Server Static Data Masking Example
Every organization has some confidential data and sensitive information stored in production databases. Since there is always an incessant need to migrate this live data to lower environments for several development and testing purposes, it becomes important to ensure suitable protection has been provided to this critical data while copying production databases to non-production environments.
In order to reproduce production issues on environments like Dev, Staging, Test, UAT, etc., data professionals tend to create test data by simply copying production data to these lower life cycle environments. The development team typically has unrestricted access to all the sensitive information with no encryption or masking on the production database restore on these environments. This easily accessible data puts confidential data of the organization at risk. Furthermore, in order to be compliant with regulations like GDPR, PCI and HIPAA, it is a must to safeguard sensitive and vital data information in the organization. Let's see one of the ways to do this.
SQL Server 2019 with SSMS 18.0 (preview 5 and higher) introduces a new security feature called Static Data Masking. This technique follows the idea of applying Static Data Masking against a production database and then creating a backup of the database with the mask applied, followed by restoring this masked copy to non-production environments. It is basically a feature that helps users create a masked copy of a SQL database. Once data is statically masked, it is permanently replaced in the cloned database and we can't change it. This feature is used for several purposes like sharing sensitive data, database development, database troubleshooting, analytics and business reporting.
Previously Static Data Masking was just available for Azure SQL DB. Recently Microsoft has made this feature available for both Azure SQL DB and on-premises SQL Server databases.
The Microsoft SQL Security team has incorporated Static Data Masking in SQL Server Management Studio (SSMS) 18.0 Preview 5 and above. This feature works on SQL Server 2012 or higher databases and Azure SQL Database. To make use of Static Data Masking, make sure your system has SSMS 18.0 (preview 5 and above) installed. The latest version of SSMS can be downloaded from here.
In this tip, we will demonstrate a brief example of how Static Data Masking works. Subsequently, we will see how useful it is to use different masking functions on sensitive data.
Create sample database for using SQL Server Static Data Masking
To show how this feature works, let's first quickly create some sample data. We have created a database named StaticDataMaskingDB, and created a table named Customers. You can download T-SQL code from here to get sample data into the StaticDataMaskingDB database. After executing this code and inserting data in the table, you can see the data present in the table Customers as shown below. This table contains columns like FirstName, LastName, SSN, DOB and EmailID. We will be masking most of these columns based on different masking functions available with this feature.
Getting started with Static Data Masking in SQL Server
Open SSMS, and right click on the database name in Object Explorer. Select tasks and Mask Database…(Preview) as shown below.
The below window pops up after selecting Mask Database… (Preview). Since Static Data Masking happens at the column level, the screen below displays all the tables available in the database. Also, you can see an option to Mask all columns in the database. By checking this option, we can mask all the columns of the database. We intend to mask a few columns here and therefore will not use this selection. In the Filter columns text box, we can search for a particular table or column name.
Click on the drop-down icon next to the table name, dbo.Customers and you can see all columns present inside the table as seen below. The datatype and whether the column is nullable is identified along with the column name in the drop down section.
Let's say, we selected column CustomerID in the Customers table. An error is thrown that says Masking is not supported on an identity column. Also, if we choose Null as a masking function from the drop-down list of the masking functions for column SSN (which is not a nullable column), we get an error saying the NULL masking function is selected for a column that does not allow NULL values.
In the case below, we selected the Group Shuffle masking function for column FirstName. A warning was issued saying Group Shuffle needs a minimum of two columns. Thus, in the Masking Configuration, the masking functions selected and columns to be masked are validated first for any configuration and schema-related errors and warnings.
Now let's select columns as shown below to apply masking. By default, it uses the Shuffle masking function and we can pick any other masking function by clicking on the drop-down list as it is done for the SSN column below. Basically, this new security feature provides five masking functions: Shuffle, Group Shuffle, Null, Single Value and String Composite. We will walk through each of them in our next section.
Understanding SQL Server Static Data Masking Functions
Let's first set the masking criteria for the columns that we need to mask in our data sample. Here, we would like to perform the Shuffle masking function on the AddressLine column, Single Value on the DateOfBirth column, Null on the EmailId column, Group Shuffle on the FirstName column and String Composite on the SSN column. These options are shown in the table below.
|Column Name||Masking Function|
All columns of table Customers are displayed in the screenshot below. Click on the checkbox next to each column that we need to mask and select the applicable masking function for each column as shown below. Let's quickly see how these masking functions work.
SQL Server Static Data Masking - Shuffle
With Shuffle masking, values in the column AddressLine are shuffled to new rows and there is no new value introduced. If there is a NULL value in the column, we have an option to not replace/shuffle this NULL value. To perform this, click on configure, and check the box Maintain NULL positions and click Okay.
SQL Server Static Data Masking - Group Shuffle
With Group Shuffle masking, we can bind more than one column in a group shuffle masking. Here we are applying it on columns FirstName and LastName as shown below.
SQL Server Static Data Masking - Null
With Null masking, values in the column EmailId are replaced with Null. The masked column has to allow NULL values for it to support Null masking.
SQL Server Static Data Masking - Single Value
Now, let's move forward and understand the Single Value masking function. Click on the configure hyperlink next to the Single Value option and type this value '2000-01-01' in the pop-up screen and click Okay. This will replace all the values in the column and assign this single value to the entire column DateOfBirth. We have to make sure the datatype of the selected column is similar to the format of the input value.
SQL Server Static Data Masking - String Composite
With String Composite masking, we can format the entire column or just a part of the column value.
Here we want to apply this masking on column SSN. Click on configure, check Advanced and select Social Security Number (keep last four digits). This will populate the required pattern values in the textboxes. This masks the column SSN from say 123-45-6789 to XXX-YY-6789.
Once the masking configuration is complete, we can choose the location on the server where the backup file will be stored. This backup option is only for on-premises databases. Also, we will have to specify the masking database name as shown above where we have named the cloned database StaticDataMaskingDBMasked.
Static Data Masking automatically creates a folder in the documents folder named Static Data Masking and stores all log files inside it. This becomes handy for debugging purposes.
Saving and loading configurations for SQL Server Static Data Masking
Additionally, we can save the masking configuration using the Save Config button located on the top of the configuration window by providing a file name and clicking on Save. This file will be stored in an XML format and can be loaded using the Load Config option.
Click OK at the bottom right corner of the configuration window to apply the Static Data Masking. A message "Masking complete!" is displayed confirming the static data masking is complete.
Validation of SQL Server Masked Data
Now that we have a backup file of the database we can restore and check the results.
Let's quickly verify the static data masking that we just applied on the cloned database. The screenshot below provides a comparison of unmasked and masked data.
- FirstName and LastName columns are shuffled with each other and all values in the column.
- DateOfBirth column has been assigned a single value 2000-01-01.
- Records in AddressLine are shuffled to new rows.
- SSN column shows the last four digits have been preserved with regular expression pattern for the first 5 digits.
- All entries in the column EmailId are null.
We observed how static data masking helps in securing sensitive data in development and testing environments and how easy it is to manage and configure it. We also walked through several masking functions supported by this new feature and how it operates for each selected field.
- Try working on a few examples using Static Data Masking in the scenarios that may seem useful to you.
- Refer to these security-related tips to learn more about security in SQL Server.
Last Updated: 2019-02-20
About the author
View all my tips