Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Scripts          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Columnstore index feature in SQL Server 2012


By:   |   Last Updated: 2012-04-02   |   Comments (9)   |   Related Tips: More > Indexing

Problem

A new feature in SQL Server 2012 is the Columnstore Index which can be used to significantly improve query performance. In this tip we will take a look of how it works and how we can use it.

Solution

There are two types of storage available in the database; RowStore and ColumnStore.

In RowStore, data rows are placed sequentially on a page while in ColumnStore values from a single column, but from multiple rows are stored contiguously. So a ColumnStore Index works using ColumnStore storage.

column store versus row store in SQL Server

Now let's show how we can create a ColumnStore Index and how performance can be improved.

Creating a Column Store Index

Creating a ColumnStore Index is the same as creating a NonClustered Index except we need to add the ColumnStore keyword as shown below.

The syntax of a ColumnStore Index is:

CREATE NONCLUSTERED COLUMNSTORE INDEX ON Table_Name (Column1,Column2,... Column N)

Performance Test

I used the AdventureWorks sample database for performing tests.

--Create the Test Table
USE [AdventureWorks2008R2]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Test_Person](
 [BusinessEntityID] [int] NOT NULL,
 [PersonType] [nchar](2) NOT NULL,
 [NameStyle] [dbo].[NameStyle] NOT NULL,
 [Title] [nvarchar](8) NULL,
 [FirstName] [dbo].[Name] NOT NULL,
 [MiddleName] [dbo].[Name] NULL,
 [LastName] [dbo].[Name] NOT NULL,
 [Suffix] [nvarchar](10) NULL,
 [EmailPromotion] [int] NOT NULL,
 [AdditionalContactInfo] [xml](CONTENT [Person].[AdditionalContactInfoSchemaCollection]) NULL,
 [Demographics] [xml](CONTENT [Person].[IndividualSurveySchemaCollection]) NULL,
 [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
 [ModifiedDate] [datetime] NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
-- We Populated this table with the Data Stored in Table Person.Person.
-- As we need Plenty of data so we ran the loop 100 times.
INSERT INTO [dbo].[Test_Person] 
SELECT P1.*
FROM Person.Person P1
GO 100
-- At this point we have 1,997,200 rows in the table.
-- Create Clustered Index  on Coloun [BusinessEntityID] 
CREATE CLUSTERED INDEX [CL_Test_Person] ON [dbo].[Test_Person]
( [BusinessEntityID])
GO
-- Creating Non - CLustered Index on 3 Columns
CREATE NONCLUSTERED INDEX [ColumnStore__Test_Person]
ON [dbo].[Test_Person]
([FirstName] , [MiddleName],[LastName])

-- Creating Non - CLustered  ColumnStore Index on 3 Columns
CREATE NONCLUSTERED COLUMNSTORE INDEX [ColumnStore__Test_Person]
ON [dbo].[Test_Person]
([FirstName] , [MiddleName],[LastName])

At this point we have created the ColumnStore Index on our test table. Now we will run the SELECT query with and without the ColumnStore Index and analyze performance.

Query Without ColumnStore Index

select [LastName],Count([FirstName]),Count([MiddleName])
from dbo.Test_Person 
group by [LastName]
Order by [LastName]
OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)

We have used the OPTION(IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX) query hint to not to use the ColumnStore Index this time.

Query With ColumnStore Index

select [LastName],Count([FirstName]),Count([MiddleName])
from dbo.Test_Person 
group by [LastName]
Order by [LastName]

Here are the Actual Execution Plans for both queries:

sql server query plan for non columnstore query

sql server query plan for columnstore query

We can see the cost when using the NonClustered Index is 59 % while using the ColumnStore index is 13%.

Now if we hover the mouse over the Index Scans we can see details for these operations.  The below is a comparison:

execution plan output for columnstore query

It is clear from the results that the query performs extremely fast after creating the ColumnStore Index as the column needed for the query is stored in the same page and the query does not have to go through every single page to read these columns.

Performing INSERT, DELETE or UPDATE Operations

We cannot perform DML ( Insert\ Update \ Delete ) operations on a table having a ColumnStore Index, because this puts the data in a Read Only mode. So one big advantage of using this feature is a Data Warehouse where most operations are read only.

For example, if you perform a DELETE operation on a table with a ColumnStore Index you will get this error:

 Msg 35330, Level 15, State 1, Line 1
DELETE statement failed because data cannot be updated 
in a table with a columnstore index. Consider disabling the 
columnstore index before issuing the DELETE statement, 
then rebuilding the columnstore index after DELETE is complete.

However, to perform the operation we would need to disable the ColumnStore Index before issuing the command as shown below:

ALTER INDEX 'Index Name' on 'table name' DISABLE

Creating a ColumnStore Index using Management Studio

Right click and select New Index and select Non-Clustered Columnstore Index...

create columnstore index using SSMS

Click add to add the columns for the index.

create columnstore index using SSMS select columns

After selecting the columns click OK to create the index.

save columnstore index using SSMS

Limitations of a ColumnStore Index

  1. It cannot have more than 1024 columns.
  2. It cannot be clustered, only NonClustered ColumnStore indexes are available.
  3. It cannot be a unique index.
  4. It cannot be created on a view or indexed view.
  5. It cannot include a sparse column.
  6. It cannot act as a primary key or a foreign key.
  7. It cannot be changed using the ALTER INDEX statement. You have to drop and re-create the ColumnStore index instead. (Note: you can use ALTER INDEX to disable and rebuild a ColumnStore index.)
  8. It cannot be created with the INCLUDE keyword.
  9. It cannot include the ASC or DESC keywords for sorting the index.
Next Steps


Last Updated: 2012-04-02


get scripts

next tip button



About the author
MSSQLTips author Rajendra Gupta Rajendra is a Consultant DBA with 9+ years of extensive experience in database administration including large critical OLAP, OLTP, Reporting and SharePoint databases.

View all my tips
Related Resources




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Wednesday, April 04, 2012 - 2:30:20 AM - manu Back To Top

http://msdn.microsoft.com/en-us/library/gg492088.aspx

Refer to above link for more information like data types limitation, query performance and all. Thanks.


Monday, April 02, 2012 - 10:59:26 PM - rajendra gupta Back To Top

If we run the query with sSTATISTICS IO and TIME On as

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE


SET STATISTICS IO ON
SET STATISTICS TIME ON


--Query will be paster here

SET STATISTICS IO OFF
SET STATISTICS TIME OFF

with and without ColumnStore index Statistics output will be

With ColumnStore index logical reads 28     CPU time = 16 ms,  elapsed time = 291 ms.
WithOut  ColumnStore index  logical reads 381942 CPU time = 9313 ms,  elapsed time = 14604 ms.

 

Columnstore indexes can produce faster results for the following reasons:

1.Only the columns needed must be read. Therefore, less data is read from disk to memory and later moved from memory to processor cache.

2.Columns are heavily compressed. This reduces the number of bytes that must be read and moved.

3.Most queries do not touch all columns of the table. Therefore, many columns will never be brought into memory. This, combined with excellent compression, improves buffer pool usage, which reduces total I/O.

4.Advanced query execution technology processes chunks of columns called batches , reducing CPU usage. Check it in the image shown above. Execution mode - Batch.


And Also Replication ,Page and row compression cannot be used with ColumnStore indexes.

 

 

 

 

 



 

 

 

 

 

 

With ColumnStore index logical reads 28     CPU time = 16 ms,  elapsed time = 291 ms.
WithOut  ColumnStore index  logical reads 381942 CPU time = 9313 ms,  elapsed time = 14604 ms.


Monday, April 02, 2012 - 2:12:09 PM - Bill Back To Top

Since DML cannot be performed on tables that have a columnstore index, I assume that such a table cannot be used in a table defined as a subsriber in replication, correct?


Monday, April 02, 2012 - 1:43:52 PM - Suresh Back To Top

Thanks, will try this feature after I get a SQL 2012 environment. This looks interesting.


Monday, April 02, 2012 - 12:51:01 PM - Ameena Back To Top

Hi Rajendra,

Thanks for the great post. I followed your example and found certain interesting things not mentioned in your post but anybody who will try this example of yours will find out. First of all there can only be one Columnstore index on a table. Second is column cannot be an identity column.

Then I tried running both of your quereis with Set Statistics io on and Set statistics time on. This is what I found in my environment.

Query without Columnstore index took 89 ms and 107 logical reads.

Query with Columnstore indext took 79 ms and only 24 logical reads.

So basically like you said there will less IO when columnstore index is used.

 

Thanks

 

 


Monday, April 02, 2012 - 9:33:48 AM - Ammanuel Lakew Back To Top

 

Perhaps my first question and this is out of the context of the topic. Hey I am so Stack with writing a query to copy one column from one table to another in MS-SQL. Is that possible and can you help me with that please? It is highly appriciated.


Monday, April 02, 2012 - 8:41:00 AM - murt Back To Top

what about the compression side of things and exactly why column store indexes are faster?

 


Monday, April 02, 2012 - 8:24:28 AM - Aaron Bertrand Back To Top

You should also note that only certain data types are valid for a columnstore index.


Monday, April 02, 2012 - 7:50:56 AM - John McVicker Back To Top

With your examples, could you also include actual queries with statistics IO and TIME turned on?  You show the estimate values but not actual values.  They should show both in-memory and disk-based runtimes to show the benefit of using fewer Reads.  You say "it is clear from the results..." but are only showing the estimates while you do have the actual data to work with.


Learn more about SQL Server tools