Columnstore index feature in SQL Server 2012

By:   |   Comments (9)   |   Related: > Indexing


Problem

A new feature in SQL Server 2012 is the Columnstore Index which can be used to significantly improve query performance. In this tip we will take a look of how it works and how we can use it.

Solution

There are two types of storage available in the database; RowStore and ColumnStore.

In RowStore, data rows are placed sequentially on a page while in ColumnStore values from a single column, but from multiple rows are stored contiguously. So a ColumnStore Index works using ColumnStore storage.

column store versus row store in SQL Server

Now let's show how we can create a ColumnStore Index and how performance can be improved.

Creating a Column Store Index

Creating a ColumnStore Index is the same as creating a NonClustered Index except we need to add the ColumnStore keyword as shown below.

The syntax of a ColumnStore Index is:

CREATE NONCLUSTERED COLUMNSTORE INDEX ON Table_Name (Column1,Column2,... Column N)

Performance Test

I used the AdventureWorks sample database for performing tests.

--Create the Test Table
USE [AdventureWorks2008R2]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Test_Person](
 [BusinessEntityID] [int] NOT NULL,
 [PersonType] [nchar](2) NOT NULL,
 [NameStyle] [dbo].[NameStyle] NOT NULL,
 [Title] [nvarchar](8) NULL,
 [FirstName] [dbo].[Name] NOT NULL,
 [MiddleName] [dbo].[Name] NULL,
 [LastName] [dbo].[Name] NOT NULL,
 [Suffix] [nvarchar](10) NULL,
 [EmailPromotion] [int] NOT NULL,
 [AdditionalContactInfo] [xml](CONTENT [Person].[AdditionalContactInfoSchemaCollection]) NULL,
 [Demographics] [xml](CONTENT [Person].[IndividualSurveySchemaCollection]) NULL,
 [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
 [ModifiedDate] [datetime] NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
-- We Populated this table with the Data Stored in Table Person.Person.
-- As we need Plenty of data so we ran the loop 100 times.
INSERT INTO [dbo].[Test_Person] 
SELECT P1.*
FROM Person.Person P1
GO 100
-- At this point we have 1,997,200 rows in the table.
-- Create Clustered Index  on Coloun [BusinessEntityID] 
CREATE CLUSTERED INDEX [CL_Test_Person] ON [dbo].[Test_Person]
( [BusinessEntityID])
GO
-- Creating Non - CLustered Index on 3 Columns
CREATE NONCLUSTERED INDEX [ColumnStore__Test_Person]
ON [dbo].[Test_Person]
([FirstName] , [MiddleName],[LastName])

-- Creating Non - CLustered  ColumnStore Index on 3 Columns
CREATE NONCLUSTERED COLUMNSTORE INDEX [ColumnStore__Test_Person]
ON [dbo].[Test_Person]
([FirstName] , [MiddleName],[LastName])

At this point we have created the ColumnStore Index on our test table. Now we will run the SELECT query with and without the ColumnStore Index and analyze performance.

Query Without ColumnStore Index

select [LastName],Count([FirstName]),Count([MiddleName])
from dbo.Test_Person 
group by [LastName]
Order by [LastName]
OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)

We have used the OPTION(IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX) query hint to not to use the ColumnStore Index this time.

Query With ColumnStore Index

select [LastName],Count([FirstName]),Count([MiddleName])
from dbo.Test_Person 
group by [LastName]
Order by [LastName]

Here are the Actual Execution Plans for both queries:

sql server query plan for non columnstore query

sql server query plan for columnstore query

We can see the cost when using the NonClustered Index is 59 % while using the ColumnStore index is 13%.

Now if we hover the mouse over the Index Scans we can see details for these operations.  The below is a comparison:

execution plan output for columnstore query

It is clear from the results that the query performs extremely fast after creating the ColumnStore Index as the column needed for the query is stored in the same page and the query does not have to go through every single page to read these columns.

Performing INSERT, DELETE or UPDATE Operations

We cannot perform DML ( Insert\ Update \ Delete ) operations on a table having a ColumnStore Index, because this puts the data in a Read Only mode. So one big advantage of using this feature is a Data Warehouse where most operations are read only.

For example, if you perform a DELETE operation on a table with a ColumnStore Index you will get this error:

 Msg 35330, Level 15, State 1, Line 1
DELETE statement failed because data cannot be updated 
in a table with a columnstore index. Consider disabling the 
columnstore index before issuing the DELETE statement, 
then rebuilding the columnstore index after DELETE is complete.

However, to perform the operation we would need to disable the ColumnStore Index before issuing the command as shown below:

ALTER INDEX 'Index Name' on 'table name' DISABLE

Creating a ColumnStore Index using Management Studio

Right click and select New Index and select Non-Clustered Columnstore Index...

create columnstore index using SSMS

Click add to add the columns for the index.

create columnstore index using SSMS select columns

After selecting the columns click OK to create the index.

save columnstore index using SSMS

Limitations of a ColumnStore Index

  1. It cannot have more than 1024 columns.
  2. It cannot be clustered, only NonClustered ColumnStore indexes are available.
  3. It cannot be a unique index.
  4. It cannot be created on a view or indexed view.
  5. It cannot include a sparse column.
  6. It cannot act as a primary key or a foreign key.
  7. It cannot be changed using the ALTER INDEX statement. You have to drop and re-create the ColumnStore index instead. (Note: you can use ALTER INDEX to disable and rebuild a ColumnStore index.)
  8. It cannot be created with the INCLUDE keyword.
  9. It cannot include the ASC or DESC keywords for sorting the index.
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Rajendra Gupta Rajendra Gupta is a Consultant DBA with 14+ years of extensive experience in database administration including large critical OLAP, OLTP, Reporting and SharePoint databases.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Wednesday, April 4, 2012 - 2:30:20 AM - manu Back To Top (16766)

http://msdn.microsoft.com/en-us/library/gg492088.aspx

Refer to above link for more information like data types limitation, query performance and all. Thanks.


Monday, April 2, 2012 - 10:59:26 PM - rajendra gupta Back To Top (16745)

If we run the query with sSTATISTICS IO and TIME On as

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE


SET STATISTICS IO ON
SET STATISTICS TIME ON


--Query will be paster here

SET STATISTICS IO OFF
SET STATISTICS TIME OFF

with and without ColumnStore index Statistics output will be

With ColumnStore index logical reads 28     CPU time = 16 ms,  elapsed time = 291 ms.
WithOut  ColumnStore index  logical reads 381942 CPU time = 9313 ms,  elapsed time = 14604 ms.

 

Columnstore indexes can produce faster results for the following reasons:

1.Only the columns needed must be read. Therefore, less data is read from disk to memory and later moved from memory to processor cache.

2.Columns are heavily compressed. This reduces the number of bytes that must be read and moved.

3.Most queries do not touch all columns of the table. Therefore, many columns will never be brought into memory. This, combined with excellent compression, improves buffer pool usage, which reduces total I/O.

4.Advanced query execution technology processes chunks of columns called batches , reducing CPU usage. Check it in the image shown above. Execution mode - Batch.


And Also Replication ,Page and row compression cannot be used with ColumnStore indexes.

 

 

 

 

 



 

 

 

 

 

 

With ColumnStore index logical reads 28     CPU time = 16 ms,  elapsed time = 291 ms.
WithOut  ColumnStore index  logical reads 381942 CPU time = 9313 ms,  elapsed time = 14604 ms.


Monday, April 2, 2012 - 2:12:09 PM - Bill Back To Top (16742)

Since DML cannot be performed on tables that have a columnstore index, I assume that such a table cannot be used in a table defined as a subsriber in replication, correct?


Monday, April 2, 2012 - 1:43:52 PM - Suresh Back To Top (16741)

Thanks, will try this feature after I get a SQL 2012 environment. This looks interesting.


Monday, April 2, 2012 - 12:51:01 PM - Ameena Back To Top (16740)

Hi Rajendra,

Thanks for the great post. I followed your example and found certain interesting things not mentioned in your post but anybody who will try this example of yours will find out. First of all there can only be one Columnstore index on a table. Second is column cannot be an identity column.

Then I tried running both of your quereis with Set Statistics io on and Set statistics time on. This is what I found in my environment.

Query without Columnstore index took 89 ms and 107 logical reads.

Query with Columnstore indext took 79 ms and only 24 logical reads.

So basically like you said there will less IO when columnstore index is used.

 

Thanks

 

 


Monday, April 2, 2012 - 9:33:48 AM - Ammanuel Lakew Back To Top (16736)

 

Perhaps my first question and this is out of the context of the topic. Hey I am so Stack with writing a query to copy one column from one table to another in MS-SQL. Is that possible and can you help me with that please? It is highly appriciated.


Monday, April 2, 2012 - 8:41:00 AM - murt Back To Top (16734)

what about the compression side of things and exactly why column store indexes are faster?

 


Monday, April 2, 2012 - 8:24:28 AM - Aaron Bertrand Back To Top (16733)

You should also note that only certain data types are valid for a columnstore index.


Monday, April 2, 2012 - 7:50:56 AM - John McVicker Back To Top (16731)

With your examples, could you also include actual queries with statistics IO and TIME turned on?  You show the estimate values but not actual values.  They should show both in-memory and disk-based runtimes to show the benefit of using fewer Reads.  You say "it is clear from the results..." but are only showing the estimates while you do have the actual data to work with.















get free sql tips
agree to terms