Tokenization in SQL Server: The Security Layer You Forgot

Problem

As privacy regulations tighten like General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), Payment Card Industry Data Security Standards (PCI DSS) organizations and more, there is an increased focus to protect sensitive information within databases. Tokenization is an option to adhere to those regulations. Let’s see how to implement SQL tokenization in SQL Server.

Solution

Tokenization is a powerful mechanism often overlooked and unlike encryption, which mathematically scrambles data. It replaces sensitive information with meaningless placeholders called tokens, storing the original data securely elsewhere. Tokenization complements other security techniques and helps organizations meet compliance while reducing data exposure. A well-designed vault and strict controls make it a reliable data protection strategy.

SQL Server Setup

Encryption by certificate

The first step is to prepare our database to handle encryption. We will store the original value using Transparent Data Encryption (TDE). I chose to encrypt by certificate.

-- MSSQLTips (TSQL)
 
USE [MSSQLTips]; -- change to your DB name
GO
 
 --Create database master key which will be used to encrypt the certificate private key
IF NOT EXISTS 
    (SELECT 1 FROM sys.symmetric_keys WHERE name = '##MS_DatabaseMasterKey##')
     CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'MSSQLTips@2025';
 
-- Create the certificate which will be used to encrypt the symmetric key
IF NOT EXISTS 
    (SELECT 1 FROM sys.certificates WHERE [name] = 'MSSQLTipsCert')
     CREATE CERTIFICATE MSSQLTipsCert 
        WITH SUBJECT = 'Certificate for Transparent Data Encryption';
 
-- Create the symmetric key
IF NOT EXISTS 
    (SELECT 1 FROM sys.symmetric_keys WHERE [name] = 'MSSQLTipsSymKey')
     CREATE SYMMETRIC KEY MSSQLTipsSymKey
        WITH ALGORITHM = AES_256
        ENCRYPTION BY CERTIFICATE MSSQLTipsCert;
GO

Testing to validate it is working:

-- MSSQLTips (TSQL)
 
OPEN SYMMETRIC KEY MSSQLTipsSymKey DECRYPTION BY CERTIFICATE MSSQLTipsCert;
 
    -- Encryption
    DECLARE @Encrypted AS varbinary(MAX) = EncryptByKey(Key_GUID('MSSQLTipsSymKey'), 'MSSQLTips');
 
    -- Decryption
    SELECT @Encrypted AS vEncrypted, CONVERT(varchar(MAX), DecryptByKey(@Encrypted)) AS vDecrypted;
 
CLOSE SYMMETRIC KEY MSSQLTipsSymKey
GO

Verify the results:

Tables

Next, we will create the table token vault. Ideally, this table should be created in a separated database to improve logical isolation with different database-level access controls. For this tip, I will simplify the process and create it in the same database. Installing this table in a separate server gives a stronger network level isolation, which can be accessed using secure linked servers or APIs. Also, there is an option to install a third-party tokenization service where tokens are exchanged via secure HTTPS or gRPC.

-- MSSQLTips (TSQL)
 
CREATE TABLE [dbo].[TokenVault](
    [TokenId] [uniqueidentifier] NOT NULL DEFAULT (newsequentialid()),
    [TokenValue] [varbinary](max) NULL,
    [CreatedAt] [datetime2](7) DEFAULT (sysutcdatetime()),
 CONSTRAINT [PK_TokenVaultEncrypted] PRIMARY KEY CLUSTERED 
(
    [TokenId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

Now I will create a table that will hold Patient data for a fictious Healthcare system:

-- MSSQLTips (TSQL)
 
CREATE TABLE [dbo].[TokenPatients](
    [PatientId] [int] IDENTITY(1,1) NOT NULL,
    [FullName] [uniqueidentifier] NULL,
    [SSN] [uniqueidentifier] NULL,    -- Social security number
    [MRN] [uniqueidentifier] NULL,    -- Medical Record Number
    [DOB] [uniqueidentifier] NULL,    -- Date of Birth
    [IsInactive] [bit] DEFAULT ((0)),
PRIMARY KEY CLUSTERED 
(
    [PatientId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

User Defined Function

This function will retrieve individual encrypted values based on the TokenId.

-- =============================================
-- Author:        SCP - MSSQLTips
-- Create date:   20250703
-- Description:   Token value from Token Vault
-- =============================================
CREATE OR ALTER FUNCTION [dbo].[ufnTokenGetValue] 
                (@TokenId uniqueidentifier)
RETURNS varbinary(MAX)
AS
BEGIN
    DECLARE  @TokenValue varbinary(MAX);
 
    SELECT         @TokenValue = [TokenValue]
        FROM     [dbo].[TokenVault]
        WHERE     [TokenId] = @TokenId;
 
    RETURN @TokenValue;
END
GO

Stored Procedures

This stored procedure will handle the sensitive information, encryption and tokenization process. I will use the following rule for the value of the PatientId:

  • If PatientID = zero it is a new record to insert
  • If PatientID is greater than zero update an existing record
  • If PatientID is negative deactivate the record, since deletions are not recommended
-- =============================================
-- Author:        SCP - MSSQLTips
-- Create date:   20250703
-- Description:   Token Patients
-- =============================================
CREATE OR ALTER   PROCEDURE [dbo].[uspTokenPatients] 
                (@PatientId int
                ,@FullName nvarchar(120)
                ,@SSN nvarchar(50)
                ,@MRN nvarchar(50)
                ,@DOB nvarchar(50))
WITH EXECUTE AS CALLER
AS
BEGIN
    SET NOCOUNT ON;
 
    IF @FullName IS NULL OR LEN(@FullName) = 0
        RETURN;
 
    BEGIN TRY
 
        -- Data encryption to be stored at the Token Vault
        OPEN SYMMETRIC KEY MSSQLTipsSymKey DECRYPTION BY CERTIFICATE MSSQLTipsCert;
 
            DECLARE @FullNameBin varbinary(MAX) = EncryptByKey(Key_GUID('MSSQLTipsSymKey'), @FullName);
            DECLARE @SSNBin varbinary(MAX) = EncryptByKey(Key_GUID('MSSQLTipsSymKey'), @SSN);
            DECLARE @MRNBin varbinary(MAX) = EncryptByKey(Key_GUID('MSSQLTipsSymKey'), @MRN);
            DECLARE @DOBBin varbinary(MAX) = EncryptByKey(Key_GUID('MSSQLTipsSymKey'), @DOB);
 
        CLOSE SYMMETRIC KEY MSSQLTipsSymKey
 
        -- Update existing patient 
        IF @PatientId > 0 BEGIN
            DECLARE  @FullNameId uniqueidentifier
                    ,@SSNId uniqueidentifier
                    ,@MRNId uniqueidentifier
                    ,@DOBId uniqueidentifier;
 
            SELECT         @FullNameId = [FullName]
                        ,@SSNId = [SSN]
                        ,@MRNId = [MRN]
                        ,@DOBId = [DOB]
                FROM     [dbo].[TokenPatients]
                WHERE     [PatientId] = @PatientId;
 
            IF [dbo].[ufnTokenGetValue] (@FullNameId) <> @FullNameBin
                UPDATE       [dbo].[TokenVault] 
                    SET      [TokenValue] = @FullNameBin 
                    WHERE    [TokenId] = @FullNameId;
 
            IF [dbo].[ufnTokenGetValue] (@SSNId) <> @SSNBin
                UPDATE       [dbo].[TokenVault] 
                    SET      [TokenValue] = @SSNBin 
                    WHERE    [TokenId] = @SSNId;
 
            IF [dbo].[ufnTokenGetValue] (@MRNId) <> @MRNBin
                UPDATE       [dbo].[TokenVault] 
                    SET      [TokenValue] = @MRNBin 
                    WHERE    [TokenId] = @MRNId;
 
            IF [dbo].[ufnTokenGetValue] (@DOBId) <> @DOBBin
                UPDATE       [dbo].[TokenVault] 
                    SET      [TokenValue] = @DOBBin 
                    WHERE    [TokenId] = @DOBId;
 
        END
 
        -- Inserting a new patient
        IF @PatientId = 0 BEGIN 
            DECLARE         @OutputTable
                TABLE    (Token uniqueidentifier);
 
            INSERT INTO  [dbo].[TokenVault] 
                        ([TokenValue]) 
                OUTPUT   INSERTED.TokenId 
                INTO     @OutputTable
                VALUES  (@FullNameBin);
 
            INSERT INTO  [dbo].[TokenVault] 
                        ([TokenValue]) 
                OUTPUT   INSERTED.TokenId 
                INTO     @OutputTable
                VALUES  (@SSNBin);
 
            INSERT INTO  [dbo].[TokenVault] 
                        ([TokenValue]) 
                OUTPUT   INSERTED.TokenId 
                INTO     @OutputTable
                VALUES  (@MRNBin);
 
            INSERT INTO  [dbo].[TokenVault] 
                        ([TokenValue]) 
                OUTPUT   INSERTED.TokenId 
                INTO     @OutputTable
                VALUES  (@DOBBin);
 
            SELECT         @FullNameId = (SELECT         Token 
                                            FROM     @OutputTable 
                                            ORDER BY (SELECT NULL) OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY)
                        ,@SSNId = (SELECT     Token 
                                    FROM     @OutputTable 
                                    ORDER BY (SELECT NULL) OFFSET 1 ROWS FETCH NEXT 1 ROWS ONLY)
                        ,@MRNId = (SELECT     Token 
                                    FROM     @OutputTable 
                                    ORDER BY (SELECT NULL) OFFSET 2 ROWS FETCH NEXT 1 ROWS ONLY)
                        ,@DOBId = (SELECT     Token 
                                    FROM     @OutputTable 
                                    ORDER BY (SELECT NULL) OFFSET 3 ROWS FETCH NEXT 1 ROWS ONLY);
 
            INSERT INTO  [dbo].[TokenPatients]
                        ([FullName]
                        ,[SSN]
                        ,[MRN]
                        ,[DOB])
                VALUES    (@FullNameId
                        ,@SSNId
                        ,@MRNId
                        ,@DOBId);
 
        END
 
        -- Delete patient
        IF @PatientId < 0 BEGIN
            UPDATE         [dbo].[TokenPatients]
                SET         [IsInactive] = ~[IsInactive]
                WHERE     [PatientId] = ABS(@PatientId);
        END
 
    END TRY
    BEGIN CATCH
        IF @@TRANCOUNT > 0
            BEGIN
                ROLLBACK TRANSACTION;
            END
            
        -- Print error information. 
        PRINT    'Error: ' + CONVERT(varchar(50), ERROR_NUMBER()) +
                ', Severity: ' + CONVERT(varchar(5), ERROR_SEVERITY()) +
                ', State: ' + CONVERT(varchar(5), ERROR_STATE()) + 
                ', Procedure: ' + ISNULL(ERROR_PROCEDURE(), '-') + 
                ', Line: ' + CONVERT(varchar(5), ERROR_LINE());
        PRINT ERROR_MESSAGE();
    END CATCH;
END
GO

This stored procedure will return the real values for a Patient:

-- =============================================
-- Author:        SCP - MSSQLTips
-- Create date:   20250703
-- Description:   Patient real values
-- =============================================
CREATE OR ALTER   PROCEDURE [dbo].[uspTokenPatientChk] 
                (@PatientId int)
WITH EXECUTE AS CALLER
AS
BEGIN
    SET NOCOUNT ON;
 
    OPEN SYMMETRIC KEY MSSQLTipsSymKey DECRYPTION BY CERTIFICATE MSSQLTipsCert;
 
        SELECT       [PatientId]
                    ,CONVERT(nvarchar(MAX), DecryptByKey([dbo].[ufnTokenGetValue] ([FullName]))) AS [FullName]
                    ,CONVERT(nvarchar(MAX), DecryptByKey([dbo].[ufnTokenGetValue] ([SSN]))) AS [SSN]
                    ,CONVERT(nvarchar(MAX), DecryptByKey([dbo].[ufnTokenGetValue] ([MRN]))) AS [MRN]
                    ,CONVERT(nvarchar(MAX), DecryptByKey([dbo].[ufnTokenGetValue] ([DOB]))) AS [DOB]
                    ,[IsInactive]
            FROM     [dbo].[TokenPatients]
            WHERE     [PatientId] = @PatientId;
 
    CLOSE SYMMETRIC KEY MSSQLTipsSymKey
END
GO

Testing the solution

Insert data for a new patient:

-- MSSQLTips (TSQL)
 
EXEC    [dbo].[uspTokenPatients]
        @PatientId = 0,
        @FullName = N'George Washington',
        @SSN = N'123-45-1111',
        @MRN = N'55VT32',
        @DOB = N'1732-02-22';
GO
 
SELECT * FROM [dbo].[TokenPatients]
GO
SELECT * FROM [dbo].[TokenVault]
GO

Resulting in:

Token map to Vault

Checking the values:

-- MSSQLTips (TSQL)
 
EXEC    [dbo].[uspTokenPatientChk]
        @PatientId = 1
GO

Resulting in:

Example of insert

Changing the SSN value for a Patient:

-- MSSQLTips (TSQL)
 
EXEC    [dbo].[uspTokenPatients]
        @PatientId = 1,    -- Type the Patient Id number to be updated
        @FullName = N'George Washington',
        @SSN = N'123-45-6789', -- Change from 123-45-1111
        @MRN = N'55VT32',
        @DOB = N'1732-02-22';
GO
 
EXEC    [dbo].[uspTokenPatientChk]
        @PatientId = 1
GO

Resulting in:

Example of update

About the tables

Token Patients

-- MSSQLTips (TSQL)
 
SELECT * FROM [dbo].[TokenPatients]
GO

This table contains mapping values and is not a concern if the table is compromised because it shows only references to the real values stored in the secure vault.

Patients table

Token Vault

-- MSSQLTips (TSQL)
 
SELECT * FROM [dbo].[TokenVault]
GO

This tables contains a reference for an encrypted value and if stored in another database or location, it does not have the encryption certificate to decrypt it, and it is secure if compromised.

Vault table

Frequently Asked Questions

What is tokenization?

Tokenization is a data protection technique that substitutes sensitive data with a non-sensitive substitute value. The values called tokens carry no exploitable data. The mapping between the token and the original data is stored in a secure, access-controlled token vault. SQL Tokenization in SQL Server is useful for storing payment data, anonymizing personal health information, protecting personal identifiers like names, phone numbers, SSNs, etc. and securing audit trails.

What is anonymization?

Anonymization is the technique to transform personal data into anonymous information in such a way that a data subject can no longer be identified directly or indirectly.

What is encryption?

Encryption is a technique to encode data using an algorithm and key. Using encryption is flexible but exposes the original data if the key is compromised. Encryption implementations require encryption and decryption when data is accessed, which uses additional CPU and memory resources.

What is Personally Identifiable Information (PII)?

Personally Identifiable Information (PII) is any data that can directly or indirectly identify a person or a household.

Sensitive Personal Information

Sensitive Personal Information is any information that can determine personal preferences, racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, genetic data, health-related data, and data concerning a person´s sexual orientation.

What are the benefits of tokenization?

Data Security – Removes all sensitive data elements from databases reducing the vulnerability for data breaches.
Digital Payments – A tokenization system can substitute a payment token for credit card account or a primary account number reducing fraud.
Natural Language Processing – Tokenization is a preprocessing technique that processes text in linguistic units transforming into smaller tokens that algorithms can understand.

What is the tokenization process?

Identification: Identify the sensitive data that needs tokenization.
Generation: Generates a random token.
Mapping: The generated token and the original data are mapped to each other in a secure token database called a token vault which is encrypted and store separately from the main database. It also can be saved in a third-party tokenization provider.
Replacement: The sensitive data is replaced by the token.
Storage: The original data is securely stored in the token vault, accessible only through the tokenization system.

What are the types of tokens?

Irreversible: Data cannot be converted back to their original data values.
Reversible: Through detokenization the data can be converted back into their original values.
Format-preserving: Consistent format as the replaced data.

What are the tokenization approaches?

Vault-base: A database holds the token and its original sensitive value. The vault must be highly protected, each tokenization requires a database lookup, and should be optimized. It is used in highly regulated environments.
Deterministic: An algorithm is used to always produce the same token for the same input. It is less secure than a randomized value due to frequency analysis and dictionary attacks. It is used for tokens that must be searchable or joinable.
Randomized: Produces a different token each time and requires secure lookup for detokenization.
Custom Functions: Defined by the tokenization logic tailored to your needs. It can have higher development cost, harder to audit or maintain, and risk of weak security, if poorly implemented.

What are your recommendations when designing a tokenized database?

Data Scope: Identify what data needs to be tokenized, which data elements are regulated, where the data will be stored, who will access the data and why.
Strategy: Choose your tokenization method as outlined above.
Vault: Choose a method to encrypt the real values. Setup the vault in a different database, and if needed for auditing purposes, log each token generation event.
Functions: Plan how to tokenize, detokenize and a purge process to remove all mappings. If needed, define how to retokenize the data which is a process to rotate or change the token mappings.
Security: Build a process to encrypt the real values using always encrypted or transparent data encryption (TDE) and role-based access. Plan the access control of the vault enabling auditing and follow the best practices for security.
Performance: Tune the detokenization process to improve response time.
Regulatory: Ensure the tokenization process meets the necessary compliance regulations that apply to your business.
Backup: Create a backup strategy for the token vault and its encryption keys. Test to validate the token mapping can be restored reliably.
Integration: Tokenize the data during entry and detokenize the data only when explicitly authorized.
Validation: Verify token consistency, permission controls, query correctness, encryption in backups, and simulate data breach scenarios to understand the potential data exposure.

Next Steps

2 Comments

  1. I don’t understand the PatientID column. Is it an actual column in the table? Can you provide the table schema? What if there is now column with unique values? Do you still need a column with unique values, even though you don’t want to tokenize that column. How would the PatientID column have a zero? I really like the idea of tokenization and appreciate your article.

    • Hello David, thanks for asking this important detail about the tokenization process. In a healthcare database the main table is the Patient one where the details about them will be saved in this table that has the key column that I called PatientId, and by design I used an integer column starting with 1. Other tables like patient diseases, allergies, genetics, and others will be referenced by Patient Id column. This is a very important detail about tokenization that tipically means replace sensitive data with a surrogate token mapped between the original value and the token kept in a secure lookup table managed by a tokenization service. Tokenize a key column, which by itself it is not a Personal sensitive data, does not worth once will break the relational integrity, making the joins and indexes inefficient, and will add a massive complexity to the database. In summary, tokenize only sensitive data columns, not primary or foreign keys. My very best regards, Sebatião

Leave a Reply

Your email address will not be published. Required fields are marked *