By: Aaron Bertrand | Updated: 2018-10-12 | Comments (1) | SQL Server 2019
Microsoft added a task in SQL Server Management Studio 17.5 called "Classify Data." The purpose is to help identify columns that are potentially sensitive in nature, and that may need to be protected in order to remain compliant under various regulations, including SOX, HIPAA, PCI, and GDPR. It does a pretty good job of identifying vulnerable columns and helping you classify them depending on the risk and the type of exposure they present. But aside from showing you the results of this classification, this feature does not raise any additional visibility to these sensitive columns, never mind any suspicious access to them.
For background on the functionality introduced in SQL Server 2017 and SSMS 17.5, imagine we have this table:
CREATE TABLE dbo.Contractors ( ContractorID int, FirstName sysname, LastName sysname, SSN char(9), Email varchar(320), PasswordHash varbinary(256), HourlyRate decimal(6,2) );
When I run the wizard, I get the following recommendations:
You might not agree with all of the classifications, so you can change them, and even can add your own, specifying various information types and sensitivity levels. These are:
When you select the columns you want to classify, click Accept, and then Save, it adds extended properties around those columns. The report in SSMS uses those extended properties, and displays them as follows:
You could of course write your own app that consumes them, and maybe there are already tools out there that do, but nothing really happens here inside of SQL Server, except these extended properties exist and they surface the relevant columns on the report.
SQL Server SENSITIVITY CLASSIFICATION
In SQL Server 2019, there is a lot more automation built into the system, allowing you to easily add sensitive data labels to columns, which will get pulled into audits by default. Let's take the same table above, and add our own sensitivity classifications similar to the ones we added above with extended properties, using the ADD SENSITIVITY CLASSIFICATION command (already available in Azure SQL Database):
ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.FirstName, dbo.Contractors.LastName WITH (LABEL = 'Confidential - GDPR', INFORMATION_TYPE = 'Contact Info'); ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.SSN WITH (LABEL = 'Highly Confidential', INFORMATION_TYPE = 'National ID'); ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.email, dbo.Contractors.PasswordHash WITH (LABEL = 'Confidential', INFORMATION_TYPE = 'Credentials'); ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.HourlyRate WITH (LABEL = 'Highly Confidential', INFORMATION_TYPE = 'Financial');
These don't create extended properties, but rather we can see these in sys.sensitivity_classifications:
If we are creating an audit, we don't have to do anything special to pick up these classifications, except be auditing the table (in other words, existing audits will simply start inheriting these classifications as they are added). So, if we have a server audit:
USE master; GO CREATE SERVER AUDIT GDPRAudit TO FILE (FILEPATH = 'C:\temp\Audit\'); GO ALTER SERVER AUDIT GDPRAudit WITH (STATE = ON); GO
Then a database audit that is monitoring read activity on the table:
USE HR; GO CREATE DATABASE AUDIT SPECIFICATION AuditContractors FOR SERVER AUDIT GDPRAudit ADD (SELECT ON dbo.Contractors BY dbo) WITH (STATE = ON);
With the audit enabled, if we run a couple of queries:
SELECT * FROM dbo.Contractors; SELECT FirstName, LastName, HourlyRate FROM dbo.Contractors;
We can observe access to specific types of information this way, with the new column data_sensitivity_information that is included in the audit:
SELECT session_server_principal_name, event_time, [host_name], [object] = [database_name] + '.' + [schema_name] + '.' + [object_name], [statement], data_sensitivity_information = CONVERT(xml, data_sensitivity_information) FROM sys.fn_get_audit_file ('c:\temp\Audit\GDPRAudit_*.sqlaudit', default, default) WHERE action_id = 'SL'; -- SELECT
You can click on any XML column value to see these results:
-- from the first row, with SELECT *: <sensitivity_attributes> <sensitivity_attribute label="Confidential - GDPR" information_type="Contact Info" /> <sensitivity_attribute label="Highly Confidential" information_type="National ID" /> <sensitivity_attribute label="Confidential" information_type="Credentials" /> <sensitivity_attribute label="Highly Confidential" information_type="Financial" /> </sensitivity_attributes> -- from the second row, with specific columns: <sensitivity_attributes> <sensitivity_attribute label="Confidential - GDPR" information_type="Contact Info" /> <sensitivity_attribute label="Highly Confidential" information_type="Financial" /> </sensitivity_attributes>
It is interesting to note that
LastName combined to yield only one element in the
XML. This means that whether you have one or 50 columns with a specific label and
information type combination, it won’t add any unnecessary data to the audit.
Anyway, for those of you that are XML-savvy, you can likely see how easy it would
be to extract that information as nodes and then run queries against it, making
it very easy to identify all the accesses to specific labels or label/type combinations.
I feel like there might be a market for a conversion tool that would translate these data classifications you may have already invested in (or built tools that consume) from extended properties to sensitivity classifications, or vice-versa. I'm not going to handle synchronization here, but I'll offer up a start, taking the extended properties and building dynamic statements to migrate them:
DECLARE @sql nvarchar(max) = N''; SELECT @sql += N'ADD SENSITIVITY CLASSIFICATION TO ' + QUOTENAME(s.name) + QUOTENAME(o.name) + QUOTENAME(c.name) + ' WITH (LABEL = ''' + REPLACE(CONVERT(nvarchar(256), l.value), '''', '''''') + ''', INFORMATION_TYPE = ''' + REPLACE(CONVERT(nvarchar(256), t.value), '''', '''''') + ''');' + CHAR(13) + CHAR(10) FROM sys.extended_properties AS t INNER JOIN sys.extended_properties AS l ON t.class = l.class AND t.major_id = l.major_id AND t.minor_id = l.minor_id INNER JOIN sys.objects AS o ON t.major_id = o.[object_id] INNER JOIN sys.columns AS c ON t.major_id = c.[object_id] AND t.minor_id = c.column_id INNER JOIN sys.schemas AS s ON o.[schema_id] = s.[schema_id] WHERE t.name = N'sys_information_type_name' AND l.name = N'sys_sensitivity_label_name'; PRINT @sql;
That's just a quick exercise. To keep things relatively consistent between
the SSMS report and the audit (and any other tool that consumes either or both),
and potentially between different versions of SQL Server in your environment, you'll
also want to work out the opposite: how to convert sensitivity classifications to
There is still going to be some subjectivity here; only you can truly know how sensitive each entity in your data model might be. But SQL Server continues to introduce innovate features that help you protect your sensitive data and remain compliant with various standards and regulations.
Read on for related tips and other resources:
- My overview of features in SQL Server 2019 CTP 2.0
- All SQL Server 2019 Tips
- SQL Server 2019 (Microsoft)
- Watch the SQL Server Blog for more SQL Server 2019 info
- SQL Server Management Studio 2017 Data Discovery and Categorization
- Is your SQL Server environment ready for GDPR?
- SQL Server Management Studio 2017 Data Discovery and Categorization
Last Updated: 2018-10-12
About the author
View all my tips