JSON Data Type in Azure SQL Database

Problem

We have data coming into our database from a REST API endpoint. The data is formatted as JSON documents. Is there an efficient way to handle JSON data within the SQL Server ecosystem? In this article, we look at the new JSON data types for SQL Server.

Solution

JSON data has been supported since SQL Server 2016. Check out this tip – Introducing JSON for SQL Server 2016. In that release, functions were provided to work with data in the JSON format, such as:

SQL Server 2022 expanded the JSON functionality with the functions JSON_PATH_EXISTS, JSON_OBJECT, and JSON_ARRAY. Those last two functions are described in the tip json_object and json_array Functions SQL Server 2022 Tutorial.

There’s already some JSON functionality packed into SQL Server, but the JSON data itself was always stored as a string, typically in a NVARCHAR(MAX) column. Unlike XML, there was no native data type for JSON. But recently, a preview of a native JSON data type was introduced in Azure SQL Database. You can find more info in an episode of Data Exposed.

In this tip, we’ll take a closer look at this new data type. Keep in mind that at the time of writing (December 2024), the data type was in public preview and it was only available in Azure SQL DB (and Fabric SQL DB as well). Also, note that the AdventureWorksLT sample database is used in the tip examples.

The JSON Data Type

Create a Table with a JSON Column

Using the JSON data type is simple; you define a column with the correct data type:

CREATE TABLE dbo.JsonTest (Id INT IDENTITY(1,1), JsonData JSON);

Unlike XML, there are no special indexes you can create for a column with the JSON data type. However, using the JSON_VALUE function, you can create a work around by using it in a computed column and then creating an index on that column. This technique is explained in the tip Advanced JSON Techniques in SQL Server – Part 2. We can generate sample JSON data using the customer table:

generate json sample data

Create JSON Data

There are multiple methods to create JSON data in SQL Server, but the easiest option is FOR JSON PATH. The option WITHOUT_ARRAY_WRAPPER is used to remove the square brackets around the generated JSON. Because we want a small JSON document for each customer instead of one large JSON file, a subquery is used so that FOR JSON PATH is executed for each individual row. We can insert the data into the table with the JSON column with the following SQL statement:

INSERT INTO dbo.JsonTest(JsonData)
SELECT
    (
    SELECT
         CustomerID
        ,Title          AS [Name.Title]
        ,FirstName      AS [Name.FirstName]
        ,MiddleName     AS [Name.MiddleName]
        ,LastName       AS [Name.LastName]
        ,Suffix         AS [Name.Suffix]
        ,CompanyName    AS [Company.CompanyName]
        ,SalesPerson    AS [Company.SalesPerson]
        ,EmailAddress   AS [ContactInfo.EmailAddress]
        ,Phone          AS [ContactInfo.Phone]
    FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
    ) myjson
FROM [SalesLT].[Customer];

Any existing JSON function can work with the new JSON data type, which means existing SQL code doesn’t need to be adapted. There’s one exception though (although this might be removed in the future): the OPENJSON function does not accept the new JSON data type. You need to explicitly convert to NVARCHAR(MAX) first, as documented in the limitations.

JSON_VALUE Function to Filter Data

We can use for example the JSON_VALUE function on the JSON column to filter the data:

SELECT *
FROM dbo.JsonTest
WHERE JSON_VALUE(JsonData,'$.Name.FirstName') = 'David'
json functions and the new json data type

Performance Comparison – JSON Data Type vs. NVARCHAR(MAX)

The documentation lists several benefits of using the new JSON data type instead of using (N)VARCHAR. One is more efficient storage.

Let’s test this out by creating two tables. The first uses the JSON data type, the other NVARCHAR(MAX).

DROP TABLE IF EXISTS dbo.JsonTest_Native;
CREATE TABLE dbo.JsonTest_Native
(Id INT IDENTITY(1,1)
,JsonData JSON);
 
DROP TABLE IF EXISTS dbo.JsonTest_Varchar;
CREATE TABLE dbo.JsonTest_Varchar
(Id INT IDENTITY(1,1)
,JsonData NVARCHAR(MAX));

With the following script, we can insert sample data into both tables. The GO operator is used to run the script in 1,000 batches (which means this script can only run using SSMS or SQLCMD).

INSERT INTO dbo.JsonTest_Native(JsonData)
SELECT
    (
    SELECT
         CustomerID
        ,Title          AS [Name.Title]
        ,FirstName      AS [Name.FirstName]
        ,MiddleName     AS [Name.MiddleName]
        ,LastName       AS [Name.LastName]
        ,Suffix         AS [Name.Suffix]
        ,CompanyName    AS [Company.CompanyName]
        ,SalesPerson    AS [Company.SalesPerson]
        ,EmailAddress   AS [ContactInfo.EmailAddress]
        ,Phone          AS [ContactInfo.Phone]
    FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
    ) myjson
FROM [SalesLT].[Customer];
GO 1000 -- about 8 minutes on a SQL DB with DTU S0
 
INSERT INTO dbo.JsonTest_Varchar(JsonData)
SELECT CONVERT(NVARCHAR(MAX),JsonData) AS JsonVar FROM dbo.JsonTest_Native;
-- little over 9 minutes on S0

Check Table Size

With the system stored procedure sp_spaceused, we can check the sizes of both tables:

sp_spaceused results

We can see that the table using the native JSON data type is about 400MB in size, while the other table is 488MB in size, which saves about 18% in storage. Data compression doesn’t seem to have any effect on the first table, nor on the second table (because NVARCHAR(MAX) is used in the rows that are stored off-table). If it’s known that the JSON document size will always be small, you can opt to go for a smaller width of the column (for example NVARCHAR(500)) and then row/page compression might have a big impact. The documentation advises trying out clustered columnstore indexes when you’re using the JSON in very big tables.

Read Comparison

Another benefit mentioned by the documentation are faster reads because the documents are already parsed. Using our big tables, the following queries are executed:

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
 
SELECT COUNT(1) 
FROM dbo.JsonTest_Native
WHERE JSON_VALUE(JsonData,'$.Name.FirstName') = 'David';
 
SELECT COUNT(1) 
FROM dbo.JsonTest_Varchar
WHERE JSON_VALUE(JsonData,'$.Name.FirstName') = 'David';

The optimizer believes the first query will have a lower cost (45%) than the second one (55%):

query plans SELECT

With statistics for time and IO enabled, we get the following results:

logical reads for both queries

The query with the JSON data type has to do less logical reads and finishes a bit quicker than the second query. To get more statistical correct results, we’re going to execute the statements multiple times and write the durations to a log table:

DROP TABLE IF EXISTS dbo.jsontestlog;
 
CREATE TABLE dbo.jsontestlog(
     Iteration          INT NOT NULL
    ,DurationNative     INT NOT NULL
    ,DurationVarchar    INT NOT NULL
);
 
DECLARE @timestamp1 DATETIME2(3);
DECLARE @timestamp2 DATETIME2(3);
DECLARE @timestamp3 DATETIME2(3);
DECLARE @iterator INT = 1;
 
WHILE (@iterator <= 30)
BEGIN
    SET @timestamp1 = SYSDATETIME();
    SELECT COUNT(1) 
    FROM dbo.JsonTest_Native
    WHERE JSON_VALUE(JsonData,'$.Name.FirstName') = 'David';
    SET @timestamp2 = SYSDATETIME();
    SELECT COUNT(1) 
    FROM dbo.JsonTest_Varchar
    WHERE JSON_VALUE(JsonData,'$.Name.FirstName') = 'David';
    SET @timestamp3 = SYSDATETIME();
    INSERT INTO dbo.jsontestlog
    (
         Iteration
        ,DurationNative
        ,DurationVarchar
    )
    VALUES (@iterator
           ,DATEDIFF(MILLISECOND,@timestamp1,@timestamp2)
           ,DATEDIFF(MILLISECOND,@timestamp2,@timestamp3)
        );
    SET @iterator += 1;
END

I accidentally executed the queries 116 times because I forgot to increment the iterator at the end of the loop (resulting in an infinite loop). After a while, I realized the script was taking too long and I cancelled it. Here are my results:

results of looping

This means we got about 4% performance improvement, just by switching the data type. Keep in mind, this is just one single query on one single data set. Its possible different results are achieved in other scenarios.

Faster Writes

Finally, another benefit is faster writes, because the query should be able to update individual values without accessing the entire document. We can try this out with the following queries:

UPDATE dbo.JsonTest_Native
SET JsonData = JSON_MODIFY(JsonData,'$.UpdateCounter',1)
WHERE id IN (
   SELECT Id
   FROM dbo.JsonTest_Native
   WHERE JSON_VALUE(JsonData,'$.Name.FirstName') = 'David'
);
GO
 
UPDATE dbo.JsonTest_Varchar
SET JsonData = JSON_MODIFY(JsonData,'$.UpdateCounter',1)
WHERE id IN (
   SELECT Id
   FROM dbo.JsonTest_Varchar
   WHERE JSON_VALUE(JsonData,'$.Name.FirstName') = 'David'
);

Again, the optimizer believes the first query will have a lower cost (46%):

query plans for update

However, if we do single execution and look at the IO and time statistics, we can see the first query is slower but also must do more work!

logical reads for update queries

In particular, the worktable has double the amount of work. This might be caused by the split operator, which changes the update to a delete and insert. When we run our test script again for 30 iterations (replacing the SELECT statements with the UPDATE statements), we get the following results:

average results for update

On average, updating the column with the JSON data type was about 41% slower (11.6 seconds compared to 8.3 seconds)! Again, this is one single use case and one single query. Results can be different in other scenarios. Also keep in mind this feature is still in preview.

Next Steps

  • If you want to learn more about JSON in SQL Server, check out the documentation.
  • At the time of writing, this feature is in preview for Azure SQL DB. This also means it’s easy to try out if you have an Azure subscription.
  • You can find an overview of all tips about SQL Server functions here.

Leave a Reply

Your email address will not be published. Required fields are marked *