Supercharge your Data Warehouse with Columnstore Indexes - Webinar Q&A
By: Koen Verbeeck | Updated: 2017-09-11 | Comments | Related: More > Indexing
An MSSQLTips webinar was presented about the columnstore index features in the SQL Server 2016 database engine. This tip answers some of the questions asked during the webinar. You can view the webinar on demand here.
Question 1 – Rebuilding a Columnstore Index
With SQL Server 2016 rebuilding columnstore indexes is not mandatory anymore to get rid of columnstore index "fragmentation", but is it always efficient regarding the fragmentation level of the index?
Columnstore indexes don’t have fragmentation in the same sense that traditional (aka row-store) indexes have. The “fragmentation” that you have in columnstore indexes is mainly determined by the number of deleted rows in a rowgroup.
Maintenance on a columnstore will focus on two parts:
- The deleted bitmap. This is a bitmap which indicates which rows have been (logically) deleted from the rowgroups. This is often referred to as “columnstore index fragmentation”. You can also have segments where all data has been deleted, which can also count as some sort of fragmentation.
- The open delta stores. These are row-storage heaps where new data is inserted into before it is moved to a compressed row group.
You have two maintenance commands for indexes:
- Reorganize. This will invoke the Tuple Mover,
which will turn closed delta stores into compressed rowgroups. If you specify
the hint COMPRESS_ALL_ROW_GROUPS_ON, all delta stores (closed and open) are
compressed. Since SQL Server 2016, reorganize is a bit more aggressive:
- When 10% or more of the rows in a rowgroup are deleted, they are physically removed and the rowgroup is recompressed.
- Multiple rowgroups can be combined into one single rowgroup to reach the maximum of 1,024,576 rows.
- A combination of the previous two is possible where logically deleted rows are removed and rowgroups are merged together.
- Rebuild. The entire columnstore index will be rebuild. Two copies of the columnstore index will exist at the same time, where the old one is dropped once the rebuild is finished. Data can be reshuffled between rowgroups (which means alignment for segment elimination can be disturbed). All deleted rows are removed and there are no open delta stores.
To answer the question: reorganize will only help with fragmentation (deleted rows) in SQL Server 2016 if there are more than 10% rows deleted in the rowgroup. If not, only rebuild can get rid of deleted rows.
You can find more information in the blog post Maintenance Solutions for Columnstore or in the official documentation.
Question 2 – How much data should be in a table before a columnstore index makes sense?
A columnstore index really pays off when you have large scans of the data. This is because of the high compression ratio and segment elimination. In order to be a bit useful, a columnstore index should at least have a couple of rowgroups, which means a couple millions of rows.
However, it’s not only the number of rows that count, but also the size on disk. If your table is only 50MB, you probably won’t notice much difference in IO savings (but your queries can run in batch mode though). Once your table goes over 1GB, you can see some real improvements because compression might reduce the size to about 100MB (theoretically). You can also have great compression savings by using page compression though. So it’s the combination of a large number of rows and large size on disk – combined with typical data warehouse queries - that becomes the sweet spot for columnstore indexes.
Question 3 – Rebuilding the Index
Rather than rebuilding a clustered columnstore index, would it be better to drop it, create a clustered rowstore index on the column you're going to query, then build the columnstore index?
The only advantage you will get from this method is that the columnstore index segments will be aligned for the column specified in the row-store clustered index. This improves segment elimination. If you just rebuild your columnstore index, your segments might become misaligned.
You can find more info in the blog posts Data Loading for Better Segment Elimination, The best column for sorting Columnstore Index on and Segment Alignment Maintenance.
Question 4 – Which columns should you include in the columnstore index?
If you create a clustered columnstore index, you have no choice: all columns are included. If you create a non-clustered columnstore index you have the option to specify columns. As a general rule I would include all columns, unless:
- You are certain the column will never be needed by an analytical query for which the index is created
- The column is not supported by a columnstore index. You can find a list of limitations in the documentation.
- If the column has too many distinct values and cannot be easily compressed. A good example are comments fields. They are highly unique and typically contain lots of text. This will blow up the size of the dictionary and results in rowgroups with less rows than the maximum of 1,024,576 rows.
Question 5 - Will adding the column store index on a table impact the performance of database?
Yes. The whole point of adding columnstore index is to improve read performance, especially for data warehouse style of queries. However, like all indexes, they will also impact queries that modify data (inserts, deletes and updates). Especially updates are slower, because they are split out into deletes (which cause fragmentation as discussed in the first question) and inserts (which might end up in open uncompressed delta stores).
Question 6 – Enterprise Edition only?
Are columnstore indexes only supported in Enterprise Edition? Will columnstore indexes also work in Developer Edition or any other edition?
In SQL Server 2012 and 2014 columnstore indexes are a feature of Enterprise Edition. However, as for all releases of SQL Server, some editions have all the features of Enterprise Edition:
- Developer Edition
- Evaluation Edition
This means you can use Enterprise, Developer and Evaluation edition to test out columnstore indexes.
Since SQL Server 2016 service pack 1, columnstore indexes are also available in Standard Edition (the Business Intelligence Edition has been removed in SQL Server 2016), as well as in Web and Express edition. So make sure you upgrade your SQL Server 2016 instance to the latest service pack!
You can find more information in the documentation.
Question 7 – Compression
Using column store indexes the data is compressed right? What is the percentage of savings that we get in storage? What percentage is the data compressed?
Compression is one of the best features of columnstore indexes. You might expect compression up to 10x less the original size. However, a lot depends on the number of rows and the columns. For the columns, two factors play a very big role in the compression ratio:
- The type of data. Integers will probably compress better than decimals because of the higher precision.
- The cardinality (or uniqueness) of the data. A column containing country names (lots of duplicates) will compress better than a column containing employee names, because the latter has more distinct values.
Question 8 – Columnstore Indexes and OLTP
Can a columnstore index be used in an OLTP database?
Absolutely. This is referred to as “real-time operational analytics” by Microsoft. You can even add columnstore indexes on an in-memory OLTP table (which makes the columnstore index really “in memory”). However, there are some guidelines to consider:
- Preferably you would use a non-clustered columnstore index (since SQL Server 2016) instead of clustered columnstore index. Since OLTP systems are optimized for write access (insert, update, delete), you might want to avoid this in a clustered index, since these types of operations are slower for a clustered columnstore index.
- Only use columnstore indexes if you have analytical queries that scan large amounts of data in your OLTP system. Traditional indexes are far better in queries that seek data or those who scan a small amount of data.
As usual, test well before you implement.
Question 9 – Candidate Tables
How to identify the best candidates tables for a columnstore index?
As mentioned a couple of times in other answers, columnstore indexes work best for large tables. In a data warehouse scenario, these are most likely your biggest fact tables and possible very large dimensions.
Take a look at the tip Identify the best tables for SQL Server 2016 Columnstore Index Migration for more information.
- If you haven’t already, take a look at the recording of the webinar.
- Niko Neugebauer has an excellent series on columnstore indexes. Already over 100 blog posts!
- There are a couple of tips on the topic of columnstore indexes, check them out as well!
About the author
View all my tips
Article Last Updated: 2017-09-11