Implementing Data Compression in SQL Server 2008
As we are looking through the new features in SQL Server 2008 we found a potentially interesting one called Data Compression which we can use to reduce the size of tables, indexes or a subset of their partitions. Can you give us a detailed explanation of how we go about using this one?
Data Compression in SQL Server has been made available in SQL Server 2005 with Service Pack 2 where a new storage format for storing decimal and numeric data was introduced. The vardecimal storage format allows decimal and numeric data types to be stored as a variable-length column. This concept has been extended in SQL Server 2008 to all fixed-legth data types such as integer, char, and float data types. Data compression reduces the storage costs and increases query performance by reducing I/O and increasing buffer-hit rates.
SQL Server 2008 supports both row and page compression for both tables and indexes. Here's how these two types of data compression differ:
- ROW Compression. This compression feature takes into account the variable data type structures defining a column. Take, for instance, a CHAR(100) column stored in a variable length storage format will only use up the amount of storage defined by the data. Storing "SQL Server 2008 " in the column will only require storing fifteen characters instead of the full 100 characters, thereby, a 85% savings on storage space. This is the extension of the vardecimal storage format made available in SQL Server 2005 Service Pack 2. Note, also, that this compression feature does not take any disk space for zero or null values.
- PAGE Compression. This compression feature is a superset of ROW compression and takes into account the redundant data in one or more rows on a given page. It also uses prefix and dictionary compression. What this simply means is that for both page compression techniques, the storage engine cuts down on repeated data in the page. For example, if a table is partitioned using a column prefix, all data in a specific partition will have the same or similar prefix. Let's say the value of columns start with something like A1000Q-xxxx like some product codes, the storage engine store the A1000Q- once on the page and then refer to this value from all other occurrences of this value on the same page. This can also be said of a column with a defined DEFAULT constraint. Page compression only occurs when the page is full to optimize the performance.
While it may appear that data compression would reduce the size of your tables or indexes, it is best to first evaluate the estimated space savings in a table or index by using either the sp_estimate_data_compression_savings system stored procedure or the Data Compression Wizard. You also might want to check if the existing data is fragmented as you might be able to reduce the size of the index by rebuilding it instead of using compression.
Estimating storage space savings
We can use the sp_estimate_data_compression_savings system stored procedure or the Data Compression Wizard to find out the estimated storage space savings for atable of index. To use the sp_estimate_data_compression_savings system stored procedure on the AdventureWorks database:
- first parameter is the schema name,
- the second parameter is the object name,
- the third parameter is the index id,
- the fourth parameter is the partition id
- and the last parameter is the compression type.
In the example above, I will be looking at ROW compression on the Sales.SalesOrderDetail table in the AdventureWorks database. Running the query will give you the results shown below.
Note the columns size_with_current_compression_setting (KB) and size_with_requested_compression_setting (KB) which will give you an estimate on the space savings you will achieve in the Sales.SalesOrderDetail table, assuming there is no fragmentation.
To use the Data Compression Wizard, right-click on the Sales.SalesOrderDetail table, select Storage and click Manage Compression.
This will launch the Data Compression Wizard. On the Welcome to Data Compression Wizard, click Next.
On the Select Compression Type page, click the drop-down menu on the Compression Type column to select your compression of choice. I'll choose ROW since this is the option used using the sp_estimate_data_compression_savings system stored procedure. Click the Calculate button to display a similar information provided by the stored procedure.
These two methods will give you an estimate of the storage space savings you will get by compressing your tables or indexes.
Enabling Compression on a Table
To enable compression on an existing non-partitioned table, use the ALTER TABLE command with the REBUILD option:
The syntax above enables ROW compression on Sales.SalesOrderDetail table. You can also continue with the Data Compression Wizard to enable compression on a table. After clicking the Calculate button on the Select Compression Type page, click Next. On the Select an Output Option page, you can chose to either create a script for review or saving to a file, run immediately or schedule as a job. As a DBA, you would have to do this as part of your regular database maintenance so you might as well create a script for it and later on include in a database maintenance job. Click on the Next button to continue.
Review the Data Compression Summary page before clicking Finish.
You may want to create a script that iterates thru all of your tables to generate a summary report using the sp_estimate_data_compression_savings system stored procedure which will help you in analyzing which tables or indexes you can compress.
As anything comes at a cost, data compression will definitely cost you CPU resources. You should evaluate and test whether the benefits outweigh the cost.
- Download an evaluation copy of SQL Server 2008 from this site.
- You can get the AdventureWorks database used in the sample here (click on AdventureWorksDB.msi).
- Try out the examples above on your sample test databases to have an idea on the benefits you will gain from data compression
- Check out Chad Boyd's blog entry on Data Compression in SQL Server 2008.
About the author
View all my tips