Masking Personal Identifiable SQL Server Data
By: Rick Dobson | Updated: 2013-11-13 | Comments (9) | Related: More > Encryption
How can our organization hide the contents of non-public data fields while still making the data's layout available to database administrators, developers, and business analysts who must work with the data? More specifically, how can we make the data fields available without disclosing confidential information such as social security numbers, phone numbers, credit card numbers, and last names. Check out this tip to learn more.
You can hide personal identity and other confidential data by obscuring values within fields containing these kinds of data. The obscuring process is typically described as data masking or sanitization. The original data is transformed so that those using the data cannot readily discover the obscured field values within individual records. You can perform this masking for the columns of a database at rest or just before displaying the data in a form window.
This tip demonstrates an approach to obscuring values in a way that the original values are changed to other values from the same set of characters and/or numbers. The transformation presented here can be reversed so that the original values can be restored. This reversibility feature can reduce the need to maintain data in obscured and non-obscured forms, and the feature facilitates confirming the validity of an obscuring process by comparing the restored values with the original values.
Masking is not a replacement for encryption or other forms of database security. Encryption recodes data fields so that they cannot be viewed without a key. Masking is attractive for scenarios where developers, database administrators, and/or business analysts need access to database fields without necessarily requiring access to valid values for those fields. Depending of your needs, you can use data masking in concert with other forms of security as required by your organization. For example, it is not uncommon to encrypt masked social security numbers. In other cases, your organization may care to mask one field and encrypt another.
Start with a SQL Server Data Masking Table
A masking table is the key to obscuring your data with the method illustrated in this tip. For each position that is to be obscured in a field, the masking table has two columns, an original text value and a mask text value. The obscuring process takes each position (or some subset of the positions) in a field and transforms from the original text value to the mask text value.
The following screen shot from a SQL Server Results tab shows an excerpt from the masking table used in this tip. Notice that the table contains three columns -- one for position, another for original text, and a third for mask text. In order to make the obscuring process reversible, each original text character can map for any given position to just one unique mask text character. In general, you will want to allow the same original text value to map to different mask text values across different positions.
Lets say you were masking just a two-character numeric column whose value for the current row was 83. The mask text value for an original text value of 8 in position 1 is 0, and the mask text value for an original text value of 3 in position 2 is 6. Therefore, the obscured representation for the original value of 83 is 06.
The sample masking table with the code for this tip has original-to-mask text transformation for 10 positions. At each position, there are transformations for each of the ten characters from 0 through 9. You can consequently use a masking table like the one in this example for obscuring social security numbers that have nine characters as well as 7 and 10 digit telephone numbers. The sample transformation for this tip is for social security numbers.
A masking table is not limited to just numeric characters. You can include the letters of the alphabet in both upper and lower case or just one case as well as special characters. There is no restriction that numeric characters must map to just numeric characters. For example, numeric characters can map to both numbers and the letters in the alphabet. Again, the key restriction to support the reversing of the masking process is that each original text value map to just one mask text value at each position. Even this restriction on the masking table can be relaxed if there is no value to being able to reverse the obscuring process.
The following code excerpt presents one approach to creating and populating a masking table, such as the one in the preceding screen shot. For demonstration purposes, the masking table is a table in the tempdb database with the name ##Tips_Masking_Table. The column names are PositionNumber, OriginalText, and MaskText. A succession of INSERT statements populates the columns for each row. In the full code listings for this tip, row values are specified for 10 numeric characters in a field.
-- Drop ##Tips Masking Table before re-creating it IF OBJECT_ID('##Tips_Masking_Table') IS NOT NULL DROP TABLE dbo.##Tips_Masking_Table CREATE TABLE ##Tips_Masking_Table ( PositionNumber int NOT NULL, OriginalText char(1) NOT NULL, MaskText char(1) ) -- Insert rows into the masking table INSERT INTO ##Tips_Masking_Table VALUES(1,'0','8') INSERT INTO ##Tips_Masking_Table VALUES(1,'1','7') INSERT INTO ##Tips_Masking_Table VALUES(1,'2','6') INSERT INTO ##Tips_Masking_Table VALUES(1,'3','5') INSERT INTO ##Tips_Masking_Table VALUES(1,'4','4') INSERT INTO ##Tips_Masking_Table VALUES(1,'5','3') INSERT INTO ##Tips_Masking_Table VALUES(1,'6','2') INSERT INTO ##Tips_Masking_Table VALUES(1,'7','1') INSERT INTO ##Tips_Masking_Table VALUES(1,'8','0') INSERT INTO ##Tips_Masking_Table VALUES(1,'9','9') INSERT INTO ##Tips_Masking_Table VALUES(2,'0','9') INSERT INTO ##Tips_Masking_Table VALUES(2,'1','0') INSERT INTO ##Tips_Masking_Table VALUES(2,'2','2') INSERT INTO ##Tips_Masking_Table VALUES(2,'3','6') INSERT INTO ##Tips_Masking_Table VALUES(2,'4','4') INSERT INTO ##Tips_Masking_Table VALUES(2,'5','7') INSERT INTO ##Tips_Masking_Table VALUES(2,'6','5') INSERT INTO ##Tips_Masking_Table VALUES(2,'7','1') INSERT INTO ##Tips_Masking_Table VALUES(2,'8','3') INSERT INTO ##Tips_Masking_Table VALUES(2,'9','8')
In the purposely simplified example for this tip, the masking table has just 100 rows -- 10 original-to-mask text transformations for each of 10 positions. The more characters that your transformation masks or sanitizes, the more rows you will have in your table. Also, there is no restriction on how many positions you can transform with this technique. The number of rows in your masking table grows as you equip your masking table to mask more positions and more characters, such as alphabetic as well as numeric characters.
This tip is a purposely simplified demonstration of a basic approach to masking. Depending on how access to the tempdb database is managed within an organization, you may find that a temporary table does not meet the security objectives for restricting access to the masking table. Other approaches may include using a table variable within a stored procedure or other database object with restricted permissions. Alternatively, you may use a permanent table with very restricted access rights. As with encryption keys, the rules for managing access to a masking table depend on the security environment within an organization and the established practices within an organization for managing security.
Masking One Value at a Time
One of the easiest ways to develop a practical understanding of how to implement masking is to modify one field value at a time. Excerpts from the Tips_SSN_Mask stored procedure illustrate key elements of how to mask data one field value at a time. The USE and CREATE PROCEDURE header statements below demonstrate how to create a stored procedure named Tips_SSN_Mask in the dbo schema of the TipsMaskUnMask database. You should create the database on your SQL Server instance before attempting to invoke the script. The SSN_IN parameter can hold up to nine characters for an original social security number. The SSN_OUT output parameter can also hold up to 9 characters for a corresponding masked social security number.
USE TipsMaskUnMask GO CREATE PROCEDURE dbo.Tips_SSN_Mask -- Add the parameters for the stored procedure here @SSN_IN varchar(9), @SSN_OUT varchar(9) OUTPUT AS
Next, you'll need to isolate each of the characters in the input and output parameters. This is because you mask characters in a value one character at a time. The local variables @P1 through @P9 are for the input parameter characters. There is one character designated for each of the nine possible characters in a social security number. Similarly, the local variables @N1 through @N9 are for the output parameter characters; these characters store the masked values for the original characters in the input parameter.
-- Declare incoming (@P) and outgoing (@N) characters DECLARE @P1 char(1) ,@P2 char(1) ,@P3 char(1) ,@P4 char(1) ,@P5 char(1) ,@P6 char(1) ,@P7 char(1) ,@P8 char(1) ,@P9 char(1) ,@N1 char(1) ,@N2 char(1) ,@N3 char(1) ,@N4 char(1) ,@N5 char(1) ,@N6 char(1) ,@N7 char(1) ,@N8 char(1) ,@N9 char(1)
There are three elements to the body of the stored procedure. First, the characters of the input parameter need to be assigned to the @P1 through @P9 local variables. Next, the lookup of the original text values must be performed for each character, and the matching mask text value assigned to the @N1 through @N9 local variables. Third, the characters in the @N1 through @N9 local variables must be combined into the output parameter value containing the obscured social security number.
The following code excerpt shows the assignment of values to the @P1 through @P9 local variables as well as the transformations to assign values to @N1 and @N2. The stored procedure repeats the process demonstrated for @N1 and @N2 to assign values to @N3 through @N9.
A SUBSTRING function sequentially extracts each of the up to nine characters from the input parameter to the @P1 through @P9 local variables. SELECT statements sequentially look up the original characters for each position in ##Tips_Masking_Table to find a matching mask text value from the table. In the first lookup, the return value from the SELECT statement is assigned to @N1. If the value of @P1 is not in ##Tips_Masking_Table, then the SELECT statement returns a NULL value. This can happen if the social security number value contains characters other than 0 through 9, such as a hyphen (-) or just a typo for an alphabetic character. The ISNULL function after the lookup SELECT statement restores the original non-numeric value. This step ensures that you can reverse the process and always recover the original values.
--Extract individual characters from @SSN_IN SET @P1 = SUBSTRING(@SSN_IN,1,1) SET @P2 = SUBSTRING(@SSN_IN,2,1) SET @P3 = SUBSTRING(@SSN_IN,3,1) SET @P4 = SUBSTRING(@SSN_IN,4,1) SET @P5 = SUBSTRING(@SSN_IN,5,1) SET @P6 = SUBSTRING(@SSN_IN,6,1) SET @P7 = SUBSTRING(@SSN_IN,7,1) SET @P8 = SUBSTRING(@SSN_IN,8,1) SET @P9 = SUBSTRING(@SSN_IN,9,1) -- Assign MaskText for corresponding -- OriginalText SET @N1 = ( SELECT MaskText FROM dbo.##Tips_Masking_Table WHERE PositionNumber = 1 AND OriginalText = @P1) SET @N1 = ISNULL(@N1,@P1) SET @N2 = ( SELECT MaskText FROM dbo.##Tips_Masking_Table WHERE PositionNumber = 2 AND OriginalText = @P2) SET @N2 = ISNULL(@N2,@P2)
The last part of the masking implementation is to construct the output parameter from the nine transformed values for the input parameter. You can do this with a simple plus (+) operator in an assignment statement for the output parameter. The code excerpt below shows the application of the plus operator along with a couple of PRINT statements. The PRINT statements return the original and processed value to the Messages pane in SQL Server Management Studio.
-- Print Original Text SSN_IN version -- Print Masked Text SSN_OUT version Print @SSN_IN SET @SSN_OUT = @N1 + @N2 + @N3 + @N4 + @N5 + @N6 + @N7 + @N8 + @N9 PRINT @SSN_OUT
Reversing the Masking Process in SQL Server
You can reverse the masking process to recover the original value from a previously sanitized column value by looking up the OriginalText Column value matching the MaskText value for a previously sanitized column value. The following code excerpt from the Tips_SSN_UnMask stored procedure illustrates the approach with our ##Tips_Masking_Table masking table for the first two characters in a column.
In the code excerpt below, the @P1 through @P2 characters are the first and second characters from a previously sanitized column value. The characters @N1 and @N2 represent de-sanitized values. These de-sanitized characters should match the originally masked characters.
-- Assign OriginalText for corresponding -- MaskText SET @N1 = ( SELECT OriginalText FROM dbo.##Tips_Masking_Table WHERE PositionNumber = 1 AND MaskText = @P1) SET @N1 = ISNULL(@N1,@P1) SET @N2 = ( SELECT OriginalText FROM dbo.##Tips_Masking_Table WHERE PositionNumber = 2 AND MaskText = @P2) SET @N2 = ISNULL(@N2,@P2)
Demonstrating SQL Server Data Masking and Unmasking
The next script shows a couple of EXECUTE statements that illustrate the use of the previously described stored procedures to successively demonstrate the masking and the unmasking of social security values. The first EXECUTE statement invokes the Tips_SSN_Mask stored procedure from the dbo schema of the TipsMaskUnMask database. The second EXECUTE statement launches the Tips_SSN_UnMask stored procedure from the same schema and database. You can use the scripts that come with the tip to create the stored procedures, but you must have previously created the database manually or programmatically.
The first EXECUTE statement passes a social security number with a value of 078051120 to the Tips_SSN_Mask. According this online resource and numerous other ones, this value is known to be an invalid social security number. The output parameter from the stored procedure with the masked value for the social security number populates the @SSN_OUT local variable in the script.
The second EXECUTE statement takes the newly populated @SSN_OUT local variable and uses it as an input parameter for the Tips_SSN_UnMask stored procedure. The return value from this second stored procedure replaces the original value in the @SSN_OUT local variable with a de-sanitized value for the input parameter.
DECLARE @SSN_IN varchar(9) DECLARE @SSN_OUT varchar(9) -- invalidated SSN: 078051120 -- Mask invalidated SSN EXECUTE [TipsMaskUnMask].[dbo].[Tips_SSN_Mask] '078051120' ,@SSN_OUT OUTPUT -- Restore invalidated SSN from masked version EXECUTE [TipsMaskUnMask].[dbo].[Tips_SSN_UnMask] @SSN_OUT ,@SSN_OUT OUTPUT
The image below shows the Messages panel output created by the preceding two stored procedure invocations. The first pair of values are generated by the Tips_SSN_Mask stored procedure. The first value in the pair reflects the original input -- namely, 078051120. The second value in the pair illustrates the masked conversion of the input parameter.
The last pair of values in the image are from the Tips_SSN_UnMask stored procedure. The first value in this second pair has the value of 813026553. This value is stored in the @SSN_OUT local variable, which receives the output parameter value from the previously invoked Tips_SSN_Mask stored procedure. The call of the Tips_SSN_UnMask stored procedure assigns a new value to the @SSN_OUT local variable. This new value is the original input parameter to the Tips_SSN_Mask stored procedure -- namely, 078051120. The output from the Tips_SSN_UnMask confirms the success of the preceding call to the Tips_SSN_Mask stored procedure.
If you decide you want to learn more about this topic, one obvious next step is to download the scripts discussed in this tip and start modifying them. It is possible to organize the elements of a masking solution in many different ways. For example, the implementation to mask a social security number discussed here might be helpful in creating input for an unbound form that displays social security number and other data fields for an individual. You can also obscure database values while they are at rest so that retrieving social security numbers will initially return sanitized values. A pair of scripts in the download for this tip illustrates an approach to masking and unmasking for database columns in their resting state. These scripts are not explicitly discussed in the tip, but you will be able to recognize key blocks of T-SQL code discussed in the tip.
As you build custom masking solutions for the organizations that you serve, I highly recommend that you secure key elements of the solution. The masking table is one key element. How many and which characters from a field you decide to mask is another element. You may also decide to double or even triple mask some fields to make it more difficult to reverse engineer a field's masking.
Here's a selection of references to learn more about masking and encryption. If you examine these references, you quickly discover that this tip introduces just one kind of data masking.
- Data masking
- Understanding and Selecting Data Masking Solutions: Creating Secure and Useful Data
- Data Masking and Data Encryption Are Different
- Encryption Tips
- Using Views to Expose Encrypted Data in SQL Server
- Download this source code.
Last Updated: 2013-11-13
About the author
View all my tips