By: Rajendra Gupta | Comments (8) | Related: > Compression
Problem
Data compression has been available natively since SQL Server 2014 with row level and column level compression, but this was done by enabling compression at either the page level or row level. In SQL Server 2016 CTP 3.1, we have newly introduced functions for compression by which specific data itself can be compressed. In this tip we are going to explore and see how this compression can be beneficial.
Solution
SQL Server 2016 CTP provides new built in functions for compression and decompression of data.
- Compress: This function compresses the input data using the GZIP algorithm and returns the binary data of type Varbinary(max)
- Decompress: This function decompresses the compressed input binary data using the GZIP algorithm and returns the binary data of type Varbinary(max).
The basic syntax for these functions are shown below, where the expression can be nvarchar(n), nvarchar(max), varchar(n), varchar(max), varbinary(n), varbinary(max), char(n), nchar(n), or binary(n).
COMPRESS ( expression ) DECOMPRESS ( expression )
COMPRESS and DECOMPRESS functions can be used as shown below:
SELECT COMPRESS ('Compress and Decompress Function SQL Server')
If we decompress the above output we get the following:
SELECT DECOMPRESS (0x1F8B080000000000040073CECF2D284A2D2E5648CC4B5170494D8671DD4AF3924B32F3F31482037D1482538BCA528B004E16B8732B000000)
As we can see from the above output DECOMPRESS function does not return the string which we have compressed above, we need to explicitly CAST the result of the DECOMPRESS function to a string datatype.
SELECT CAST(0x436F6D707265737320616E64204465636F6D70726573732046756E6374696F6E2053514C20536572766572 AS VARCHAR(MAX))
COMPRESS and DECOMPRESS Examples
Now to explore this further let's create a standard table and insert some data using the COMPRESS function.
CREATE TABLE dbo.product ( Id INT IDENTITY(1,1), Name NVARCHAR(max), Description VARBINARY(MAX) ) GO INSERT INTO dbo.product (Name, Description) VALUES('TestDemo', COMPRESS(N'This Demo is to show how we can use the new Compress and decompress function in sql server 2016 CTP 3.1 onwards'))
Now if we do a normal SELECT statement the output will be:
So we have to use CAST as mentioned above to see the actual text. Here we are using CAST and DECOMPRESS to output the data to DecompressedTest.
SELECT Id, Name, description, CAST( DECOMPRESS(description) AS NVARCHAR(MAX)) AS DecomressedTest FROM dbo.product
Storage Savings for Compressed Data
Now let's see where compression can be useful. For this we created three strings having different data lengths and compared the lengths before and after compression.
DECLARE @TextToCompress1 VARCHAR(MAX) DECLARE @TextToCompress2 VARCHAR(MAX) DECLARE @TextToCompress3 VARCHAR(MAX) SELECT @TextToCompress1 = 'Data Compression SQL SERVER' SELECT @TextToCompress2 = 'The compression technique which were available with Prior Version of SQL server are Row level and Column Level compression. SQL Server 2016 CTP 3.1 Onwards we have newly introduced functions for Compression. In this Tip we are going to explore and see how this can be beneficial.' SELECT @TextToCompress3='The compression technique which were available with Prior Version of SQL server are Row level and Column Level compression. SQL Server 2016 CTP 3.1 Onwards we have newly introduced functions for Compression. In this Tip we are going to explore and see how this can be beneficial. The COMPRESS function compresses the data provided as the input expression and must be invoked for each section of data to be compressed. This is demo purposes only.' SELECT DATALENGTH(@TextToCompress1) 'Before_compression-1', DATALENGTH(COMPRESS(@TextToCompress1)) 'After_compression-1', DATALENGTH(@TextToCompress2) 'Before_compression-2', DATALENGTH(COMPRESS(@TextToCompress2)) 'After_compression-2', DATALENGTH(@TextToCompress3) 'Before_compression-3', DATALENGTH(COMPRESS(@TextToCompress3)) 'After_compression-3'
We can see here compression is not useful for the small data lengths, but it could be very beneficial for large data lengths, so we need to evaluate as it may create more overhead if not planned properly.
Using the correct CAST options with DECOMPRESS
As we have seen, DECOMPRESS doesn't return the actual string unless we use CAST, so if the correct data type is not used while using DECOMPRESS it will not give the desired output.
DECLARE @varcharValue NVARCHAR(MAX) = 'SQL Server 2016 Compress and Decompress function', @compressedValue VARBINARY(MAX) SET @compressedValue = COMPRESS(@varcharValue) SELECT @varcharValue OriginalValue, CAST(DECOMPRESS(@compressedValue) AS VARCHAR(MAX)) AS Decompress1, CAST(DECOMPRESS(@compressedValue) AS NVARCHAR(10)) AS Decompress2, CAST(DECOMPRESS(@compressedValue) AS NVARCHAR(20)) AS Decompress3, CAST(DECOMPRESS(@compressedValue) AS NVARCHAR(MAX)) AS Decompress4
Storage Savings and Performance Gains with Compression
Now let's look at another where compression is useful. We will create some data and store the same data compressed and uncompressed and compare the storage needs.
CREATE TABLE Test_uncompress (test varchar(max)) --table to hold uncompressed data CREATE TABLE Test_compressed (test Varbinary(max)) -- table to hold compressed data -- Now insert test data into uncompressed table INSERT INTO Test_uncompress values (replicate('DemoCompress and uncompress function',5000)) GO 200000 --Insert data into compressed table by doing compress of values stored in uncompressed table INSERT INTO Test_compressed SELECT compress(test) from Test_uncompress
If we look at sp_spaceused for both tables we can see the comparison below:
We can see the data size is 20808 KB in the compressed table and 1600200 KB for the uncompressed table.
Also, if we do straight SELECTs from both tables we can also see a difference in performance, but this example does not CAST the data which may increase the time of the second query.
set statistics time on set statistics IO on SELECT TOP 1000 * FROM dbo.Test_uncompress GO SELECT TOP 1000 * FROM Test_compressed GO
As shown above, we can see both the logical reads and elapsed time are reduced for the compressed data thus having a performance benefit.
Note: One important point to note is that compressed data cannot be indexed.
Next Steps
- Check out SQL Server 2016 tips
- Read more about Compress Function
- Read more about Decompress Function
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips