Use Dynamic Data Masking in SQL Server 2016 CTP 2.1

Problem

SQL Server 2016 introduces a new security feature called Dynamic Data Masking (DDM). In this tip we’ll describe its purpose, show a brief example of how it works, list some limitations, and note how the feature has already changed since CTP 2.0 was first released in May.

Note: This tip was written when CTP 2.1 was the most recent CTP. Some of these details may change by the time SQL Server 2016 is released.

Solution

Dynamic Data Masking is not meant to be a bulletproof security measure like proper encryption. You can consider it to be more like obfuscation of personally-identifiable information (PII) from lesser-privileged users, and it can be easy to apply the masking constraints to, say, a development or test environment where this information should not be visible.

Note that for CTP 2.0, at least at the time of writing, the documentation shows that trace flags 209 and 219 are required to use Dynamic Data Masking. This restriction has been lifted since CTP 2.1, and should only be necessary if you are still running build 13.0.200. (In fact if you enable the trace flags in newer builds, it seems to have the opposite effect: You get an error message when trying to use DDM.)

To show how the feature works, let’s create a simple set of 10 rows from the metadata:

USE tempdb;
GO
SELECT TOP (10) 
  ID        = IDENTITY(INT, 1, 1),
  FirstName = RIGHT(o.name, 8), 
  LastName  = LEFT(o.name, 12), 
  Email     = LEFT(o.name, 9) + '@' + RIGHT(o.name, 11) + '.net',
  SSN       = STUFF(STUFF(RIGHT('000000000' 
            + RTRIM(ABS(CHECKSUM(NEWID()))),9),4,0,'-'),7,0,'-'),
  BirthDate = DATEADD(DAY, -ABS(CHECKSUM(NEWID())%10000), o.modify_date)
INTO dbo.DDM
FROM sys.all_objects AS o
ORDER BY NEWID();

Now, let’s add some masks to some of these columns to protect the privacy of the data from normal (non-sysadmin) users. There are three types of masks currently supported:

default()
- for strings, shows x for each character (up to 4)
- for numeric types, shows 0
- for dates, shows 2000-01-01
email()
- reveals the first character, then replaces the remainder with XXX@XXXX.com
partial()
- you can define a custom string to represent the mask, including how many leading and trailing characters to reveal from the original string (examples below)

So for these columns, the following mask settings might make sense:

ALTER TABLE dbo.DDM ALTER COLUMN FirstName 
    ADD MASKED WITH (FUNCTION = 'partial(1, "XXXXX", 0)');
    -- show only the first character of the first name
ALTER TABLE dbo.DDM ALTER COLUMN LastName  
    ADD MASKED WITH (FUNCTION = 'partial(2, "XXXXXXXX", 1)');
    -- show the first two characters and the last character of the last name
ALTER TABLE dbo.DDM ALTER COLUMN Email     
    ADD MASKED WITH (FUNCTION = 'email()');
    -- all addresses will show as nXXX@XXXX.com
ALTER TABLE dbo.DDM ALTER COLUMN SSN       
    ADD MASKED WITH (FUNCTION = 'partial(1,"XX-XX-XXX",1)');
    -- SSNs will become nXX-XX-XXXn
ALTER TABLE dbo.DDM ALTER COLUMN BirthDate 
    ADD MASKED WITH (FUNCTION = 'default()');
    -- all Birthdates will show as 2000-01-01

Now, what does this data look like? Well, to a sysadmin or db_owner, there is no masking implemented at all (note that the script populates arbitrary data, so your values won’t be the same):

SELECT * FROM dbo.DDM;

However, if we create a peon user and just grant them SELECT on the table:

CREATE USER blat WITHOUT LOGIN;
GRANT SELECT ON dbo.DDM TO blat;

Then we can try running the same query in their security context:

EXECUTE AS USER = N'blat';
GO
SELECT * FROM dbo.DDM;
GO
REVERT;

And the results:

Now we can see that the data is masked as expected.

Limitations

Bypassing masking

The current version of the documentation states that you can bypass the dynamic data masking by applying explicit conversions to masked columns in ad hoc queries. This was possible in CTP 2.0, but is no longer possible as of CTP 2.1.

EXECUTE AS USER = N'blat';
GO
SELECT TOP (1) FirstName, CAST(FirstName AS VARCHAR(32)) FROM dbo.DDM;
GO
REVERT;

In CTP 2.0, this would have yielded something like:

sXXXXX  sactions

In CTP 2.1, the data masking does not apply exactly as specified, but still protects the data; it reverts to the default four-character masking:

sXXXXX  xxxx

There is another way to bypass the masking functionality, at least as of CTP 2.1: Involve a second table.

CREATE TABLE dbo.SecondTable(ID INT);
INSERT dbo.SecondTable(ID) VALUES(1);
GO
EXECUTE AS USER = N'blat';
GO
SELECT d.FirstName FROM dbo.DDM AS d
  WHERE EXISTS (SELECT 1 FROM dbo.SecondTable AS s
    WHERE s.ID = d.ID);
GO
REVERT;

This will yield the un-masked value:

sactions

In fact all that’s really required is to give the table an alias other than its own name; even when a join is not present, this will disable any masking when the data is returned (again, in CTP 2.1; I’m sure this will be fixed in future builds):

EXECUTE AS USER = N'blat';
GO
SELECT TOP (1) FirstName FROM dbo.DDM AS d ORDER BY d.ID;
GO
REVERT;

Results:

sactions

If you drop the alias or use “DDM” as the alias, masking still occurs, so it only seems to get thwarted if there is some type of mapping to a new alias. It is important to note that this means that, even if you create stored procedures, your users may still have access to the un-obfuscated data, as they likely would with ad hoc query privileges.

Data Types

As with most v1 features, there are some limitations here, which may or may not be permanent. The following data types are unlikely to be supported when SQL Server 2016 is released:

(var)binary / image
xml
sql_variant
hierarchyid
uniqueidentifier
rowversion (timestamp)
spatial types

Also, not all masking functions are relevant for all data types: One of the first things I tried was to only mask the month/day of the birthdate, so you could determine approximate age without really getting into PII. I received this error message:

Msg 16003, Level 16, State 0
The data type of column 'BirthDate' does not support data 
masking function 'partial'.

Finally, you won’t be able to apply data masking functions against encrypted columns or columns that participate in a sparse column set.

Changing/removing a mask

To change the definition for an existing masked column, you can just run the same ALTER COLUMN command (it is not intuitive, but there is no change/alter mask DDL (yet)):

ALTER TABLE dbo.DDM ALTER COLUMN SSN       
    ADD MASKED WITH (FUNCTION = 'partial(2,"X-XX-XXX",1)');
    -- SSNs will now become nnX-XX-XXXn

And to remove the mask definition altogether, you can just say DROP MASKED:

ALTER TABLE dbo.DDM ALTER COLUMN SSN DROP MASKED;

Summary

Dynamic data masking provides a simple way to implement obfuscation of private data (and in my tests so far, the performance impact is negligible). It is important to understand the limitations, and keep in mind that it is not true encryption and that the data cannot be protected in all scenarios. In another tip I will talk about a more end-to-end way to protect sensitive data, even from the sysadmin role, in SQL Server 2016: Always Encrypted.

Next Steps

Download the latest SQL Server 2016 CTP (or register for a trial of Azure SQL Database v12, where this feature appeared first).
Try out Dynamic Data Masking in scenarios where it may seem useful.
See these other security-related tips:

Aaron Bertrand

Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He also blogs at sqlblog.org.

MSSQLTips Awards: Author of the Year – 2016, 2023 | Leadership (200+ tips) – 2022

Bypassing masking

Data Types

Leave a ReplyCancel Reply