Getting started with Indexes on SQL Server Memory Optimized Tables

By: Arshad Ali | Comments (2) | Related: > In Memory OLTP

Problem

SQL Server 2014 introduces the In-memory OLTP engine which is fully integrated in SQL Server and allows you to create memory optimized tables. Data for these memory optimized tables gets stored in memory in a completely different structure than traditional disk based tables. But what about the indexes? What types of indexes can be created on memory optimized tables and when they can be used? How does this work?

Solution

SQL Server 2014 introduces the In-memory OLTP engine which is fully integrated in SQL Server and allows you to create memory optimized tables. These memory optimized tables have a completely different data and index structure with no locking or latching required when you access the data. This design results in a minimal amount of waiting time and no blocking. Data for the memory optimized tables gets stored in memory in a completely different structure and there are no data pages or index pages or buffer pool for memory optimized tables. Please refer to these tips in the series to learn more about In-memory OLTP engine and memory optimized tables:

There are two types of indexes which can be created on memory optimized tables, namely a HASH index or RANGE index. A memory-optimized table must always have at least one index, although if you create a primary key on the table, this requirement will be satisfied. You can create up to 8 indexes on a memory optimized table including the one supporting the primary key although no unique index is supported other than the primary key.

All 8 indexes can only be non-clustered indexes. These indexes don't duplicate data, but rather they just point to the rows in the chain. These indexes are not logged and don't get stored on disk. The indexes are maintained online and created every time during recovery. Does this mean it will not ever get fragmented? Yes you are right, it will not and hence you don't have the overhead in maintaining these indexes and therefore it will not impact disk IO. There are certain aspects which are addressed by the garbage collection process of the In-memory engine and this should not be a major concern.

One very important point to note about In-Memory OLTP indexes are that they get created when you create the table and there is no way to drop, add or modify indexes on the memory optimized table. This is because the index pointers are stored as part of the row structure as you can see in my last tip in this series. Having said that, you need to do thorough analysis and proper index planning, based on the workload or queries you will have, before you create the memory optimized table. Keep in mind one index is required when creating memory optimized tables or you will get an exception similar as shown below:

Msg 41327, Level 16, State 7, Line 1
The memory optimized table 'Customer1' must have at least one index or a primary key.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint or index. See previous errors.

SQL Server 2014 In-Memory OLTP Hash Indexes

Hash Indexes are an ideal candidate if you have lots have queries which are going to do equi-joins (using "=" operator). Unlike B-Tree indexes on traditional disk based tables, these indexes don't duplicate data, but rather they just point to the rows in the chain. It is mandatory to have at least one index on the memory optimized table; in other words indexes are used as entry points for memory-optimized tables. Reading rows from a table requires an index to locate the data in memory.

As mentioned earlier, the In-memory OLTP engine maintains these indexes online and they get created every time during server recovery. Also, all these are covering, meaning they include all the columns in the table. As you can see below, when creating a Hash index you need to specify BUCKET COUNT and the recommended value for this should be two times the expected unique values of that column. An internal hash function maps index keys to corresponding buckets in the hash index. The hash function is deterministic and always maps the same index key to the same bucket in the hash index and at the same time multiple matched index keys might be mapped to the same hash bucket. This situation is called hash collision and a large number of hash collisions can have a performance impact on read operations. That's the reason the recommended value for BUCKET COUNT should be two times of expected unique values of that column.

Let me show you some example on how to create a hash index and how it works. First of all, let me create a memory optimized table with hash indexes as shown below. As you can see, I have created a primary key hash index on the CustomerID column and another hash index on the Age column:

CREATE TABLE [CustomerWithHashIndex](
  [CustomerID] INT NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1000000),
  [Name] NVARCHAR(250) NOT NULL,
  [CustomerSince] DATETIME NULL,
  Age SMALLINT NOT NULL INDEX [IAge] NONCLUSTERED HASH WITH (BUCKET_COUNT = 200)
)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);

The script below loads data into the above created memory optimized table with hash indexes:

DECLARE @counter int = 1
DECLARE @Name nchar(4) = N'ABCD'
DECLARE @rowcount int = 1000 --100000
DECLARE @CustomerSince DATETIME = '01/01/1900'
WHILE @counter <= @rowcount
BEGIN
   INSERT INTO [dbo].[CustomerWithHashIndex] (CustomerID, Name, CustomerSince, Age) 
   VALUES (@counter, @Name + CONVERT(VARCHAR(10), @counter), @CustomerSince+@counter, RAND()*100)
   SET @counter += 1
END
GO

Now let's execute these two queries, one with equality operator and one with range of values and analyze the execution plan of each:

SELECT * FROM [CustomerWithHashIndex]
WHERE Age = 50
SELECT * FROM [CustomerWithHashIndex]
WHERE Age BETWEEN 50 AND 60

As you can see in the below image, the first execution plan uses the hash index (Index Seek) for the query with equality operator whereas the second execution plan does not use hash index at all and uses a Table Scan operator instead for the query since it needs a range predicate:

Getting started with Indexes on Memory Optimized Table

SQL Server 2014 In-Memory OLTP Range Indexes

The Range index is an ideal candidate in case if you have lots of queries which perform a range selection (using ">" or "<" operators) or you might choose a range index if you are not sure about the BUCKET COUNT value for the hash index. Range indexes are implemented using Bw-trees, a novel lock-free version of B-trees to fully utilize the emergence of new hardware and platforms. Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor cache of modern multi-core chips. To learn more about Bw-trees of range index, please refer to this research paper or this whitepaper by Kalen Delaney.

Now let me demonstrate an example on how to create a range index and how it works. First of all, let me create a memory optimized table with a hash index and a range index. As you can see in the script below, I have created a primary key hash index on the CustomerID column and a range index on the Age column. As you can see, unlike with the Hash index there is no special keyword to use when creating a range index; you create it in a way you create a traditional non clustered index on the table though range indexes are implemented using a new data structure called a Bw-tree on the memory optimized table which is a lock and latch free variation of a B-tree as discussed above:

CREATE TABLE [CustomerWithRangeIndex](
  [CustomerID] INT NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1000000),
  [Name] NVARCHAR(250) NOT NULL,
  [CustomerSince] DATETIME NULL,
  Age SMALLINT NOT NULL INDEX [IAge] NONCLUSTERED
)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);

The script below loads some data into the above created memory optimized table with hash and range indexes:

DECLARE @counter int = 1
DECLARE @Name nchar(4) = N'ABCD'
DECLARE @rowcount int = 1000
DECLARE @CustomerSince DATETIME = '01/01/1900'
WHILE @counter <= @rowcount
BEGIN
  INSERT INTO [dbo].[CustomerWithRangeIndex] (CustomerID, Name, CustomerSince, Age) 
  VALUES (@counter, @Name + CONVERT(VARCHAR(10), @counter), @CustomerSince+@counter, RAND()*100)
  SET @counter += 1
END
GO

Now let's execute these two queries, one with an equality operator and one with a range operator then analyze the execution plan of each of these queries:

SELECT * FROM [CustomerWithRangeIndex]
WHERE Age = 50
SELECT * FROM [CustomerWithRangeIndex]
WHERE Age BETWEEN 50 AND 60

As you can see in the below image, both of the execution plans use the range index for both the queries. Unlike with a hash index on a memory table as shown above, you can see below the range index is being used for a range selection (when using ">" or "<" operators or when using BETWEEN):

you can see below the range index is being used for range selection

If you look into the sys.indexes catalog view, it now has a new index type and description values for the hash index whereas for a range index, it uses the same information as a nonclustered B-tree index:

SELECT 'CustomerWithHashIndex' AS TableName, * FROM sys.indexes
WHERE object_id = OBJECT_ID ('CustomerWithHashIndex')
SELECT 'CustomerWithRangeIndex' AS TableName, * FROM sys.indexes
WHERE object_id = OBJECT_ID ('CustomerWithRangeIndex')

Hash Index : Type = 7 and type_desc = NONCLUSTERED HASH and for Range Index : type = 2 and type_desc = NONCLUSTERED

Please note, if you have installed SQL Server 2014 CTP1, it only supports Hash indexes. You need to install SQL Server 2014 CTP2 or the latest version to get started using Range Indexes on In-Memory OLTP tables.

Next Steps

Review the earlier tips in this series to learn more about In-memory OLTP and memory optimized tables.
Review Hash Index on MSDN.
Review refer to this research paper to learn more about Bw-tree structure used for range index.
Review Guidelines for Using Indexes on Memory-Optimized Tables on MSDN.
Review this whitepaper by Kalen Delaney.

About the author

Arshad Ali is a SQL and BI Developer focusing on Data Warehousing projects for Microsoft.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips