Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Advanced JSON Techniques in SQL Server 2016 - Part 2


By:   |   Read Comments (1)   |   Related Tips: More > SQL Server 2016

Attend this free live MSSQLTips webcast

Tips and Tricks to Growing Your SQL Server Estate from SMB to Enterprise Scale
Thursday, June 28, 2018 - click here to learn more


Problem

In my previous tip, Advanced JSON Techniques in SQL Server 2016 - Part 1, I gave examples of three new functions to support JSON features: OPENJSON(), ISJSON(), and JSON_VALUE(). In this tip, I wanted to show how you can use JSON_VALUE() to index particular data points within a JSON string.

Solution

Let's say you have a simple table to store data about customers, and because different customers can have different attributes, you've chosen to store those attributes in an NVARCHAR column using JSON structure. The original table is pretty simple:

CREATE TABLE dbo.Customers
(
  CustomerID INT NOT NULL,
  DateCreated DATETIME2 NOT NULL DEFAULT SYSDATETIME(),
  [GUID] UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(),
  Attributes NVARCHAR(4000)
  CONSTRAINT PK_Cust_A PRIMARY KEY(CustomerID),
  CONSTRAINT IsValidJSON_A CHECK (ISJSON(Attributes) = 1)
);

A sample row would have the following JSON structure in the Attributes column:

{"HairColor" : "red", "EyeColor : "green", "YearJoined" : 1974 }

Now, if I want to find all the customers with red hair, I would run a query like this:

SELECT CustomerID, JSON_VALUE(Attributes, '$.HairColor')
  FROM dbo.Customers_A
  WHERE JSON_VALUE(Attributes, '$.HairColor') = N'red'; 

This, of course, would have to scan the entire table, according to SQL Sentry Plan Explorer:

Table scan when searching for a particular value with JSON

A little trick I've learned from Jovan Popovic, though, is that you can create a computed column from JSON_VALUE(), and index the computed column. So let's create a new copy of the table:

CREATE TABLE dbo.Customers_B
(
  CustomerID INT NOT NULL,
  DateCreated DATETIME2 NOT NULL DEFAULT SYSDATETIME(),
  [GUID] UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(),
  Attributes NVARCHAR(4000),
  HairColor AS JSON_VALUE(Attributes, '$.HairColor'),
  CONSTRAINT PK_Cust_B PRIMARY KEY(CustomerID),
  CONSTRAINT IsValidJSON_B CHECK (ISJSON(Attributes) = 1)
);

And then we can create this index:

CREATE INDEX HairColor ON dbo.Customers_B(HairColor);

Note that you'll get a warning here, because SQL Server has no way to know how long a specific JSON attribute will be - it could, theoretically, be almost the entire column:

Warning! The maximum key length for a nonclustered index is 1700 bytes. The index 'HairColor' has maximum length of 8000 bytes. For some combination of large values, the insert/update operation will fail.

Now we can make the query simpler:

SELECT CustomerID, HairColor 
   FROM dbo.Customers_B
   WHERE HairColor = N'red';

And now we are likely to get a much more efficient index seek:

Index Seek with an index created on the JSON_VALUE column

We also get the same plan for the original query, so SQL Server is already doing a great job of recognizing the JSON_VALUE() arguments and matching it to our indexed, computed column.

We could also choose to make the column persisted, which can mean less work for queries at the cost of more work for DML:

CREATE TABLE dbo.Customers_C
(
  CustomerID INT NOT NULL,
  DateCreated DATETIME2 NOT NULL DEFAULT SYSDATETIME(),
  [GUID] UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(),
  Attributes NVARCHAR(4000),
  HairColor AS JSON_VALUE(Attributes, '$.HairColor') PERSISTED,
  CONSTRAINT PK_Cust_C PRIMARY KEY(CustomerID),
  CONSTRAINT IsValidJSON_C CHECK (ISJSON(Attributes) = 1)
);
GO

CREATE INDEX HairColor ON dbo.Customers_C(HairColor);

This leads again to the same plan and, at least at my small data volumes, similar performance. At scale, though, persisted computed columns are worth considering:

Query plan with a computed column with the JSON data

If you want to avoid the warning from above, or - more importantly - failures from inserting JSON strings with extremely long values for the HairColor attribute - you can create this index instead:

CREATE TABLE dbo.Customers_D
(
  CustomerID INT NOT NULL,
  DateCreated DATETIME2 NOT NULL DEFAULT SYSDATETIME(),
  [GUID] UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(),
  Attributes NVARCHAR(4000),
  HairColor AS JSON_VALUE(Attributes, '$.HairColor') PERSISTED,
  CONSTRAINT PK_Cust_D PRIMARY KEY(CustomerID),
  CONSTRAINT IsValidJSON_D CHECK (ISJSON(Attributes) = 1)
);
GO

CREATE INDEX HairColor ON dbo.Customers_D(CustomerID) INCLUDE(HairColor);

This works, however since the HairColor value is no longer in the key for the index, there is no way for SQL Server to satisfy this specific query without performing a scan - though it chooses this skinnier index instead of the wider clustering key:

Index Scan on JSON computed column when the data is an INCLUDED column

Also note that I tried creating a filtered index, since, wouldn't it be nice to be able to create a smaller index that only has the rows we might care about? Alas, I tried this:

CREATE INDEX HairColor2 ON dbo.Customers_B(HairColor) WHERE HairColor = N'red';

And was promptly told:

Msg 10609, Level 16, State 1
Filtered index 'HairColor2' cannot be created on table 'dbo.Customers_B' because the column 'HairColor' in the filter expression is a computed column. Rewrite the filter expression so that it does not include this column.
Summary

JSON structures can be stored in SQL Server, but by default they just look like big blobs of text to the optimizer. You can use computed columns and indexes to make operations for extracting and searching specific JSON elements more efficient.

Next Steps


Last Update:


next webcast button


next tip button



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand is a Senior Consultant at SQL Sentry, Inc., and has been contributing to the community for about two decades, first earning the Microsoft MVP award in 1997.

View all my tips
Related Resources





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Wednesday, November 04, 2015 - 7:58:35 PM - Ian Yates Back To Top

Out of curiosity, what if you tried to create the filtered index against table _C rather than _B?  The difference being that "C" has the column persisted.  I don't have SQL 2016 installed at the moment to try it out unfortunately.  

Thanks for the post - the new JSON stuff looks like it'll be quite handy.  Do you know if it has full support in the Express edition too?


Learn more about SQL Server tools