Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Masking Personal Identifiable SQL Server Data


By:   |   Read Comments (7)   |   Related Tips: More > Encryption

Quickly Resolve Performance Problems for IIS, .NET and SQL Server       >>>   Get Started


Problem

How can our organization hide the contents of non-public data fields while still making the data's layout available to database administrators, developers, and business analysts who must work with the data? More specifically, how can we make the data fields available without disclosing confidential information such as social security numbers, phone numbers, credit card numbers, and last names.  Check out this tip to learn more.

Solution

You can hide personal identity and other confidential data by obscuring values within fields containing these kinds of data. The obscuring process is typically described as data masking or sanitization. The original data is transformed so that those using the data cannot readily discover the obscured field values within individual records. You can perform this masking for the columns of a database at rest or just before displaying the data in a form window.

This tip demonstrates an approach to obscuring values in a way that the original values are changed to other values from the same set of characters and/or numbers. The transformation presented here can be reversed so that the original values can be restored. This reversibility feature can reduce the need to maintain data in obscured and non-obscured forms, and the feature facilitates confirming the validity of an obscuring process by comparing the restored values with the original values.

Masking is not a replacement for encryption or other forms of database security. Encryption recodes data fields so that they cannot be viewed without a key. Masking is attractive for scenarios where developers, database administrators, and/or business analysts need access to database fields without necessarily requiring access to valid values for those fields. Depending of your needs, you can use data masking in concert with other forms of security as required by your organization. For example, it is not uncommon to encrypt masked social security numbers. In other cases, your organization may care to mask one field and encrypt another.

Start with a SQL Server Data Masking Table

A masking table is the key to obscuring your data with the method illustrated in this tip. For each position that is to be obscured in a field, the masking table has two columns, an original text value and a mask text value. The obscuring process takes each position (or some subset of the positions) in a field and transforms from the original text value to the mask text value.

The following screen shot from a SQL Server Results tab shows an excerpt from the masking table used in this tip. Notice that the table contains three columns -- one for position, another for original text, and a third for mask text. In order to make the obscuring process reversible, each original text character can map for any given position to just one unique mask text character. In general, you will want to allow the same original text value to map to different mask text values across different positions.

A masking table is the key to obscuring your data with the method illustrated in this tip.

Lets say you were masking just a two-character numeric column whose value for the current row was 83. The mask text value for an original text value of 8 in position 1 is 0, and the mask text value for an original text value of 3 in position 2 is 6. Therefore, the obscured representation for the original value of 83 is 06.

The sample masking table with the code for this tip has original-to-mask text transformation for 10 positions. At each position, there are transformations for each of the ten characters from 0 through 9. You can consequently use a masking table like the one in this example for obscuring social security numbers that have nine characters as well as 7 and 10 digit telephone numbers. The sample transformation for this tip is for social security numbers.

A masking table is not limited to just numeric characters. You can include the letters of the alphabet in both upper and lower case or just one case as well as special characters. There is no restriction that numeric characters must map to just numeric characters. For example, numeric characters can map to both numbers and the letters in the alphabet. Again, the key restriction to support the reversing of the masking process is that each original text value map to just one mask text value at each position. Even this restriction on the masking table can be relaxed if there is no value to being able to reverse the obscuring process.

The following code excerpt presents one approach to creating and populating a masking table, such as the one in the preceding screen shot. For demonstration purposes, the masking table is a table in the tempdb database with the name ##Tips_Masking_Table. The column names are PositionNumber, OriginalText, and MaskText. A succession of INSERT statements populates the columns for each row. In the full code listings for this tip, row values are specified for 10 numeric characters in a field.

-- Drop ##Tips Masking Table before re-creating it
IF OBJECT_ID('##Tips_Masking_Table') IS NOT  NULL
      DROP TABLE dbo.##Tips_Masking_Table
CREATE TABLE ##Tips_Masking_Table (
PositionNumber int NOT  NULL,
OriginalText char(1) NOT NULL,
MaskText char(1)
)
 
-- Insert rows into the masking table
INSERT INTO ##Tips_Masking_Table  VALUES(1,'0','8')
INSERT INTO ##Tips_Masking_Table  VALUES(1,'1','7')
INSERT INTO ##Tips_Masking_Table  VALUES(1,'2','6')
INSERT INTO ##Tips_Masking_Table  VALUES(1,'3','5')
INSERT INTO ##Tips_Masking_Table  VALUES(1,'4','4')
INSERT INTO ##Tips_Masking_Table  VALUES(1,'5','3')
INSERT INTO ##Tips_Masking_Table  VALUES(1,'6','2')
INSERT INTO ##Tips_Masking_Table  VALUES(1,'7','1')
INSERT INTO ##Tips_Masking_Table  VALUES(1,'8','0')
INSERT INTO ##Tips_Masking_Table  VALUES(1,'9','9')
INSERT INTO ##Tips_Masking_Table  VALUES(2,'0','9')
INSERT INTO ##Tips_Masking_Table  VALUES(2,'1','0')
INSERT INTO ##Tips_Masking_Table  VALUES(2,'2','2')
INSERT INTO ##Tips_Masking_Table  VALUES(2,'3','6')
INSERT INTO ##Tips_Masking_Table  VALUES(2,'4','4')
INSERT INTO ##Tips_Masking_Table  VALUES(2,'5','7')
INSERT INTO ##Tips_Masking_Table  VALUES(2,'6','5')
INSERT INTO ##Tips_Masking_Table  VALUES(2,'7','1')
INSERT INTO ##Tips_Masking_Table  VALUES(2,'8','3')
INSERT INTO ##Tips_Masking_Table  VALUES(2,'9','8')

In the purposely simplified example for this tip, the masking table has just 100 rows -- 10 original-to-mask text transformations for each of 10 positions. The more characters that your transformation masks or sanitizes, the more rows you will have in your table. Also, there is no restriction on how many positions you can transform with this technique. The number of rows in your masking table grows as you equip your masking table to mask more positions and more characters, such as alphabetic as well as numeric characters.

This tip is a purposely simplified demonstration of a basic approach to masking. Depending on how access to the tempdb database is managed within an organization, you may find that a temporary table does not meet the security objectives for restricting access to the masking table. Other approaches may include using a table variable within a stored procedure or other database object with restricted permissions. Alternatively, you may use a permanent table with very restricted access rights. As with encryption keys, the rules for managing access to a masking table depend on the security environment within an organization and the established practices within an organization for managing security.

Masking One Value at a Time

One of the easiest ways to develop a practical understanding of how to implement masking is to modify one field value at a time. Excerpts from the Tips_SSN_Mask stored procedure illustrate key elements of how to mask data one field value at a time. The USE and CREATE PROCEDURE header statements below demonstrate how to create a stored procedure named Tips_SSN_Mask in the dbo schema of the TipsMaskUnMask database. You should create the database on your SQL Server instance before attempting to invoke the script. The SSN_IN parameter can hold up to nine characters for an original social security number. The SSN_OUT output parameter can also hold up to 9 characters for a corresponding masked social security number.

USE TipsMaskUnMask
GO
CREATE PROCEDURE dbo.Tips_SSN_Mask 
      -- Add the parameters for the stored procedure here
      @SSN_IN varchar(9), 
      @SSN_OUT varchar(9) OUTPUT
AS

Next, you'll need to isolate each of the characters in the input and output parameters. This is because you mask characters in a value one character at a time. The local variables @P1 through @P9 are for the input parameter characters. There is one character designated for each of the nine possible characters in a social security number. Similarly, the local variables @N1 through @N9 are for the output parameter characters; these characters store the masked values for the original characters in the input parameter.

-- Declare incoming (@P) and outgoing (@N) characters
DECLARE
             @P1 char(1)
            ,@P2 char(1)
            ,@P3 char(1)
            ,@P4 char(1)
            ,@P5 char(1)
            ,@P6 char(1)
            ,@P7 char(1)
            ,@P8 char(1)
            ,@P9 char(1)
            ,@N1 char(1)
            ,@N2 char(1)
            ,@N3 char(1)
            ,@N4 char(1)
            ,@N5 char(1)
            ,@N6 char(1)
            ,@N7 char(1)
            ,@N8 char(1)
            ,@N9 char(1)

There are three elements to the body of the stored procedure. First, the characters of the input parameter need to be assigned to the @P1 through @P9 local variables. Next, the lookup of the original text values must be performed for each character, and the matching mask text value assigned to the @N1 through @N9 local variables. Third, the characters in the @N1 through @N9 local variables must be combined into the output parameter value containing the obscured social security number.

The following code excerpt shows the assignment of values to the @P1 through @P9 local variables as well as the transformations to assign values to @N1 and @N2. The stored procedure repeats the process demonstrated for @N1 and @N2 to assign values to @N3 through @N9.

A SUBSTRING function sequentially extracts each of the up to nine characters from the input parameter to the @P1 through @P9 local variables. SELECT statements sequentially look up the original characters for each position in ##Tips_Masking_Table to find a matching mask text value from the table. In the first lookup, the return value from the SELECT statement is assigned to @N1. If the value of @P1 is not in ##Tips_Masking_Table, then the SELECT statement returns a NULL value. This can happen if the social security number value contains characters other than 0 through 9, such as a hyphen (-) or just a typo for an alphabetic character. The ISNULL function after the lookup SELECT statement restores the original non-numeric value. This step ensures that you can reverse the process and always recover the original values.

--Extract individual characters from @SSN_IN
      SET @P1 =  SUBSTRING(@SSN_IN,1,1)
      SET @P2 =  SUBSTRING(@SSN_IN,2,1)
      SET @P3 =  SUBSTRING(@SSN_IN,3,1)
      SET @P4 =  SUBSTRING(@SSN_IN,4,1)
      SET @P5 =  SUBSTRING(@SSN_IN,5,1)
      SET @P6 =  SUBSTRING(@SSN_IN,6,1)
      SET @P7 =  SUBSTRING(@SSN_IN,7,1)
      SET @P8 =  SUBSTRING(@SSN_IN,8,1)
      SET @P9 =  SUBSTRING(@SSN_IN,9,1)
      -- Assign MaskText for corresponding
      -- OriginalText
      SET @N1 = 
      (
      SELECT MaskText
      FROM dbo.##Tips_Masking_Table
      WHERE
                  PositionNumber = 1
                  AND
                  OriginalText = @P1)            
      SET @N1 =  ISNULL(@N1,@P1)                 
      SET @N2 = 
      (
      SELECT MaskText
      FROM dbo.##Tips_Masking_Table
      WHERE
                  PositionNumber = 2
                  AND
                  OriginalText = @P2)
      SET @N2 =  ISNULL(@N2,@P2)

The last part of the masking implementation is to construct the output parameter from the nine transformed values for the input parameter. You can do this with a simple plus (+) operator in an assignment statement for the output parameter. The code excerpt below shows the application of the plus operator along with a couple of PRINT statements. The PRINT statements return the original and processed value to the Messages pane in SQL Server Management Studio.

      -- Print Original Text SSN_IN version
      -- Print Masked Text SSN_OUT version
      Print @SSN_IN           
      SET @SSN_OUT = @N1 + @N2 + @N3  + @N4 + @N5 + @N6 + @N7 + @N8  + @N9
      PRINT @SSN_OUT

Reversing the Masking Process in SQL Server

You can reverse the masking process to recover the original value from a previously sanitized column value by looking up the OriginalText Column value matching the MaskText value for a previously sanitized column value. The following code excerpt from the Tips_SSN_UnMask stored procedure illustrates the approach with our ##Tips_Masking_Table masking table for the first two characters in a column.

In the code excerpt below, the @P1 through @P2 characters are the first and second characters from a previously sanitized column value. The characters @N1 and @N2 represent de-sanitized values. These de-sanitized characters should match the originally masked characters.

      -- Assign OriginalText for corresponding
      -- MaskText      
      SET @N1 = 
      (
      SELECT OriginalText
      FROM dbo.##Tips_Masking_Table
      WHERE
                  PositionNumber = 1
                  AND
                  MaskText = @P1)          
      SET @N1 =  ISNULL(@N1,@P1)                 
      SET @N2 = 
      (
      SELECT OriginalText
      FROM dbo.##Tips_Masking_Table
      WHERE
                  PositionNumber = 2
                  AND
                  MaskText = @P2)
      SET @N2 =  ISNULL(@N2,@P2)

Demonstrating SQL Server Data Masking and Unmasking

The next script shows a couple of EXECUTE statements that illustrate the use of the previously described stored procedures to successively demonstrate the masking and the unmasking of social security values. The first EXECUTE statement invokes the Tips_SSN_Mask stored procedure from the dbo schema of the TipsMaskUnMask database. The second EXECUTE statement launches the Tips_SSN_UnMask stored procedure from the same schema and database. You can use the scripts that come with the tip to create the stored procedures, but you must have previously created the database manually or programmatically.

The first EXECUTE statement passes a social security number with a value of 078051120 to the Tips_SSN_Mask. According this online resource and numerous other ones, this value is known to be an invalid social security number. The output parameter from the stored procedure with the masked value for the social security number populates the @SSN_OUT local variable in the script.

The second EXECUTE statement takes the newly populated @SSN_OUT local variable and uses it as an input parameter for the Tips_SSN_UnMask stored procedure. The return value from this second stored procedure replaces the original value in the @SSN_OUT local variable with a de-sanitized value for the input parameter.

DECLARE @SSN_IN varchar(9) 
DECLARE @SSN_OUT varchar(9) 
-- invalidated SSN: 078051120
-- Mask invalidated SSN 
EXECUTE [TipsMaskUnMask].[dbo].[Tips_SSN_Mask] 
'078051120' 
,@SSN_OUT OUTPUT 
 
-- Restore invalidated SSN from masked version 
EXECUTE [TipsMaskUnMask].[dbo].[Tips_SSN_UnMask] 
 @SSN_OUT 
,@SSN_OUT OUTPUT

The image below shows the Messages panel output created by the preceding two stored procedure invocations. The first pair of values are generated by the Tips_SSN_Mask stored procedure. The first value in the pair reflects the original input -- namely, 078051120. The second value in the pair illustrates the masked conversion of the input parameter.

The last pair of values in the image are from the Tips_SSN_UnMask stored procedure. The first value in this second pair has the value of 813026553. This value is stored in the @SSN_OUT local variable, which receives the output parameter value from the previously invoked Tips_SSN_Mask stored procedure. The call of the Tips_SSN_UnMask stored procedure assigns a new value to the @SSN_OUT local variable. This new value is the original input parameter to the Tips_SSN_Mask stored procedure -- namely, 078051120. The output from the Tips_SSN_UnMask confirms the success of the preceding call to the Tips_SSN_Mask stored procedure.

Demonstrating Masking and Unmasking
Next Steps

If you decide you want to learn more about this topic, one obvious next step is to download the scripts discussed in this tip and start modifying them. It is possible to organize the elements of a masking solution in many different ways. For example, the implementation to mask a social security number discussed here might be helpful in creating input for an unbound form that displays social security number and other data fields for an individual. You can also obscure database values while they are at rest so that retrieving social security numbers will initially return sanitized values. A pair of scripts in the download for this tip illustrates an approach to masking and unmasking for database columns in their resting state. These scripts are not explicitly discussed in the tip, but you will be able to recognize key blocks of T-SQL code discussed in the tip.

As you build custom masking solutions for the organizations that you serve, I highly recommend that you secure key elements of the solution. The masking table is one key element. How many and which characters from a field you decide to mask is another element. You may also decide to double or even triple mask some fields to make it more difficult to reverse engineer a field's masking.

Here's a selection of references to learn more about masking and encryption. If you examine these references, you quickly discover that this tip introduces just one kind of data masking.



Last Update:


signup button

next tip button



About the author
MSSQLTips author Rick Dobson Rick Dobson is a Microsoft Certified Technical Specialist and well accomplished SQL Server and Access author.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Wednesday, July 09, 2014 - 10:48:23 PM - Rick Dobson Back To Top

In my opinion, your function is an excellent enhancement to the core ideas introduced in my article.  I strongly endorse your improvement.


Wednesday, July 09, 2014 - 1:38:18 AM - Mohammed Ali Back To Top

Thank you for this valueable article.

We need to cleanse some of our production data before delivering it to a potential vendor in order to complete a POC.

I used your concept of using a table that holds all digits and letters, the position and the mask that should be applied (Char_Mask_XREF). Then I crated the following function:

--*********************************************

CREATE  FUNCTION [fn_Mask] (@Text VARCHAR(100))  

RETURNS VARCHAR(100) 

AS  

BEGIN 

DECLARE @Mask AS VARCHAR(100)

SET @Mask= ''

 

DECLARE @Char AS CHAR(1)

DECLARE @Counter AS INT=1

DECLARE @CharReplace AS CHAR(1)

 

WHILE @Counter

BEGIN

SET @Char=SUBSTRING(@Text,@Counter,1)

SELECT @CharReplace =Mask FROM Char_Mask_XREF WHERE Position=@Counter AND OrigText=@Char

 

SET @Mask=@Mask+ ISNULL(@CharReplace,@Char)

 

SET @Counter=@Counter + 1

SET @Char=NULL

SET @CharReplace=NULL

END

 

RETURN (@Mask)

END

--*********************************************

Then I tried to update the tables in the staging area.. With something like this:

UPDATE tbl_Customers

SET Full_Name=fn_Mask(Full_name)

      ,SSN=fn_Mask(SSN)

      ,Phone_Number=fn_Mask(Phone_Number)

 

If the table has only hundreds or even few thousands of rows, the code works very well. But with a larger table the update statement may take hours and I need to update many tables that which could take days to finish.

Any ideas of how to streamline this process? I would appreciate any ideas.

 

Thank you!!


Friday, June 20, 2014 - 12:39:37 AM - Terry Back To Top

To add to the list of available tools:

* DataVeil

Comprehensive, fast and extremely easy to use data masking software. Supports SQL Server and Oracle. You can download a freeware version. The freeware does not expire and is same as the commercial version. The only difference is a masking limit of 500,000 sensitive values per run (unlimited runs, databases, projects). This can be plenty for smaller or subset databases. Short demo video at web site. http://www.dataveil.com 


Wednesday, March 19, 2014 - 11:13:42 PM - Rick Dobson Back To Top

Glad the article motivated your reply.  I hope it provokes others to share theiir thoughts with you on this topic.  I have moved on to other topics that do not provide me the opportunity to continue working on masking.  Soon I hope to be able to make some new contributions to MSQLTips.com on a new topic that is taking up most of my time.

 

 


Wednesday, March 19, 2014 - 2:07:51 PM - Adrian Back To Top

I like your solution. Maybe using a tally table could be used to avoid code duplication, i imagine something like the following. It is not tested, but i think the idea is valid.

 

DECLARE @MaskedText AS VARCHAR(50)

SET @MaskedText = ''

 

;WITH Tally  AS (SELECT TOP 50 Number = ROW_NUMBER() OVER(ORDER BY object_id ) FROM sys.columns)

 

SELECT @MaskedText= @MaskedText + MaskText

FROM Tally t

CROSS APPLY (SELECT Position = t.Number, OriginalChar = SUBSTRING(@CompleteText,t.Number,1) ) cp

INNER JOIN dbo.##Tips_Masking_Tablemt

        ONmt.PositionNumber = cp.Position AND

            mt.OriginalText = cp.OriginalChar

WHERE t.Number <= LEN(@CompleteText)

ORDER BY t.Number ASC

 

select @MaskedText

Let me know what you think. The idea is that you will send the complete text to the procedure and it will split and mask it no matter the length.                 


Sunday, March 02, 2014 - 11:55:42 PM - Virginia Back To Top

Hello Rick, and David, and all the readers.

If we are talking about data masking tools of third parties, it would be fair to list many of them and provide some basic analysis of features. Among third party tools there are the following commercial solutions:

The vendors on the market:

1 IBM Optim ( out of reach usually for people in SQL server stack),

2. Oracle data maskng pack- natively integrating into Oracle and not into SQL Server,

3. Informatica - ETL based tool for any database or files.

4.  Hush-Hush (http://mask-me.net), an SSIS components suite that provides all the benefits of Informatica with its variety of algorithms and flexibility of ETL solution, yet  natively integrates into SQL Server's SSIS Toolbox. If you are looking for scaling with parallelism, and relieving the server from the string operations (SET @P1 = SUBSTRING(@SSN_IN,1,1) etc.), and freeing yourself from figuring out algorithms and mapping tables operations,  it is the way to go.

5.  Green SQL, stand-alone software that does dynamic data masking via stored procedures

6. Voltage software that uses Format preserving encryption and were the pioneers in that concept and creators of the term -and who hold the patent for it

7.  Axis technologies that have web based interface and use services

...and many more solutions.

 

 

 


Tuesday, February 18, 2014 - 5:14:38 PM - David Friedland Back To Top

Hi Rick,

I'm not sure if you're interested in mentioning a third-party data masking tool for SQL Server, SSMS, or SSIS users, but if so, IRI makes FieldShield available as part of the IRI Workbench (Eclipse GUI) which connects to SQL Server and allows table view/edit there as well. FieldShield uses 4GL job scripts rather than SQL to (conditionally) mask columns with character obfuscation, encryption, pseudonymizatoin, hashing, blurring, tokenization and other functions for HIPAA, PCI, and other privacy law compliance inititatives. This can be done on a static or dynamic basis and apply across columns to preserve referential integrity and make data test-ready. Masking individual columns with particular 'field shields' increases security because even if one column obfuscation method is exposed, the others wouldn't necessarily be. We also have a test data generation tool for SQL Server called RowGen that does not rely on production data. Please let me know if you'd like to have a look.


Learn more about SQL Server tools