By: Aaron Bertrand | Comments (1) | Related: 1 | 2 | 3 | > SQL Server 2016
Problem
In my previous tip, Advanced JSON Techniques in SQL Server 2016 - Part 1, I gave examples of three new functions to support JSON features: OPENJSON(), ISJSON(), and JSON_VALUE(). In this tip, I wanted to show how you can use JSON_VALUE() to index particular data points within a JSON string.
Solution
Let's say you have a simple table to store data about customers, and because different customers can have different attributes, you've chosen to store those attributes in an NVARCHAR column using JSON structure. The original table is pretty simple:
CREATE TABLE dbo.Customers ( CustomerID INT NOT NULL, DateCreated DATETIME2 NOT NULL DEFAULT SYSDATETIME(), [GUID] UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(), Attributes NVARCHAR(4000) CONSTRAINT PK_Cust_A PRIMARY KEY(CustomerID), CONSTRAINT IsValidJSON_A CHECK (ISJSON(Attributes) = 1) );
A sample row would have the following JSON structure in the Attributes column:
{"HairColor" : "red", "EyeColor : "green", "YearJoined" : 1974 }
Now, if I want to find all the customers with red hair, I would run a query like this:
SELECT CustomerID, JSON_VALUE(Attributes, '$.HairColor') FROM dbo.Customers_A WHERE JSON_VALUE(Attributes, '$.HairColor') = N'red';
This, of course, would have to scan the entire table, according to SQL Sentry Plan Explorer:
A little trick I've learned from Jovan Popovic, though, is that you can create a computed column from JSON_VALUE(), and index the computed column. So let's create a new copy of the table:
CREATE TABLE dbo.Customers_B ( CustomerID INT NOT NULL, DateCreated DATETIME2 NOT NULL DEFAULT SYSDATETIME(), [GUID] UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(), Attributes NVARCHAR(4000), HairColor AS JSON_VALUE(Attributes, '$.HairColor'), CONSTRAINT PK_Cust_B PRIMARY KEY(CustomerID), CONSTRAINT IsValidJSON_B CHECK (ISJSON(Attributes) = 1) );
And then we can create this index:
CREATE INDEX HairColor ON dbo.Customers_B(HairColor);
Note that you'll get a warning here, because SQL Server has no way to know how long a specific JSON attribute will be - it could, theoretically, be almost the entire column:
Now we can make the query simpler:
SELECT CustomerID, HairColor FROM dbo.Customers_B WHERE HairColor = N'red';
And now we are likely to get a much more efficient index seek:
We also get the same plan for the original query, so SQL Server is already doing a great job of recognizing the JSON_VALUE() arguments and matching it to our indexed, computed column.
We could also choose to make the column persisted, which can mean less work for queries at the cost of more work for DML:
CREATE TABLE dbo.Customers_C ( CustomerID INT NOT NULL, DateCreated DATETIME2 NOT NULL DEFAULT SYSDATETIME(), [GUID] UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(), Attributes NVARCHAR(4000), HairColor AS JSON_VALUE(Attributes, '$.HairColor') PERSISTED, CONSTRAINT PK_Cust_C PRIMARY KEY(CustomerID), CONSTRAINT IsValidJSON_C CHECK (ISJSON(Attributes) = 1) ); GO CREATE INDEX HairColor ON dbo.Customers_C(HairColor);
This leads again to the same plan and, at least at my small data volumes, similar performance. At scale, though, persisted computed columns are worth considering:
If you want to avoid the warning from above, or - more importantly - failures from inserting JSON strings with extremely long values for the HairColor attribute - you can create this index instead:
CREATE TABLE dbo.Customers_D ( CustomerID INT NOT NULL, DateCreated DATETIME2 NOT NULL DEFAULT SYSDATETIME(), [GUID] UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(), Attributes NVARCHAR(4000), HairColor AS JSON_VALUE(Attributes, '$.HairColor') PERSISTED, CONSTRAINT PK_Cust_D PRIMARY KEY(CustomerID), CONSTRAINT IsValidJSON_D CHECK (ISJSON(Attributes) = 1) ); GO CREATE INDEX HairColor ON dbo.Customers_D(CustomerID) INCLUDE(HairColor);
This works, however since the HairColor value is no longer in the key for the index, there is no way for SQL Server to satisfy this specific query without performing a scan - though it chooses this skinnier index instead of the wider clustering key:
Also note that I tried creating a filtered index, since, wouldn't it be nice to be able to create a smaller index that only has the rows we might care about? Alas, I tried this:
CREATE INDEX HairColor2 ON dbo.Customers_B(HairColor) WHERE HairColor = N'red';
And was promptly told:
Filtered index 'HairColor2' cannot be created on table 'dbo.Customers_B' because the column 'HairColor' in the filter expression is a computed column. Rewrite the filter expression so that it does not include this column.
Summary
JSON structures can be stored in SQL Server, but by default they just look like big blobs of text to the optimizer. You can use computed columns and indexes to make operations for extracting and searching specific JSON elements more efficient.
Next Steps
- Download SQL Server 2016 CTP3 when it becomes available (or register for a trial of Azure SQL Database, where this feature will also be available).
- Make sure your database compatibility level is 130; during the CTP period at least, JSON features may not function under older compatibility levels.
- Try out JSON in scenarios where it may seem useful.
- See these related tips and other resources:
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips