Unraveling the Tuple Mover for SQL Server Columnstore Indexes
Have you heard someone talk about columnstore indexes and mention the tuple mover? Your first thought might be what the heck is that, but they've been talking too long already, so you don't want to ask them. This article explores a little-known background process used with columnstore. It's mysterious and only periodically emerges from hiding like the legendary Loch Ness Monster or Bigfoot. However, unlike Nessy, the tuple mover and Bigfoot are real. In all seriousness, a firm grasp of the tuple mover helps when you troubleshoot columnstore index issues.
This article explores the three primary tasks the tuple mover performs. If you plan to work with columnstore, understanding it will be invaluable. A couple of the questions I'll answer as we go through are what precisely the tuple mover is and why you should care. Information on the tuple mover is floating around online, but I aim to bring it into one place. Having prior knowledge of columnstore indexes will be helpful as you go through this. By the end of this article, you'll better understand a tiny but mighty SQL Server process.
What is the Tuple Mover?
Microsoft first introduced the tuple mover alongside columnstore indexes in SQL Server 2012. The simplest explanation is that it's a background process that performs maintenance on rowgroups. Whenever I think about the tuple mover, an image of a robot cleaning up after me comes to mind. Picture a Roomba but cooler. The tuple mover operates on clustered and nonclustered indexes (we'll focus on the latter). Also, most information applies to SQL Server 2016 and onward.
What Does It Do?
Do you have a hard time describing what your job entails? My story typically comes out as I work on computers. I imagine the tuple mover feels the same way when hanging out with all the other background processes. Columnstore, like any other index type, endures wear and tear from inserts, updates, and deletes. The tuple mover is here to help keep our columnstore indexes tidy. Let's go through each of the major services it provides.
Compresses Closed Rowgroups
When a rowgroup reaches 1,048,576 rows, the state changes from OPEN to CLOSED, meaning no more rows; this one is full. The tuple mover is on the lookout for CLOSED rowgroups, and once it finds them, it performs coding and compression, turning them into segments.
Removes Tombstone Rowgroups
If you delete all the rows in a rowgroup, SQL changes the state to TOMBSTONE. Additionally, when a CLOSED rowgroup switches to compressed, the original one is marked as TOMBSTONE. The tuple mover is on the lookout for these and removes them. The removal process frees up space.
Transfers Records from Buffer to the Bitmap Table
When someone deletes records from rowgroups, SQL marks them for deletion, but they are still part of the columnstore index. When the row number reaches the 1,048,576 threshold, the tuple mover wakes up and moves the records from the delete buffer to the delete bitmap table. The delete bitmap is page compressed, but the bitmap is not. Storing records in the bitmap table leads to a smaller index size.
Building the Dataset
Let's build a dataset to watch the tuple mover in action. Since I primarily work with nonclustered columnstore indexes, we'll use one here. I'm creating one table with ten million rows and eight columns with the script below. This number produces ten rowgroups, nine filled to the brim and one residual.
/* MSSQLTips.com */ USE [master]; GO IF DATABASEPROPERTYEX( 'TupleMoverDemo', 'Version' ) IS NOT NULL BEGIN ALTER DATABASE TupleMoverDemo SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE TupleMoverDemo; END; GO CREATE DATABASE TupleMoverDemo; GO ALTER DATABASE TupleMoverDemo SET RECOVERY SIMPLE; GO USE TupleMoverDemo; GO CREATE TABLE dbo.SalesTrackingHistory ( Id INT IDENTITY(1, 1), SalesCode INT, CustomerId INT, OrderId INT, TrackingId INT, TrackingCount INT, CONSTRAINT PK_SalesTrackingHistory_ID PRIMARY KEY CLUSTERED (Id) ); GO DECLARE @UpperBound INT = 15000000; ;WITH cteN (Number) AS ( SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id]) FROM sys.all_columns AS s1 CROSS JOIN sys.all_columns AS s2 ) SELECT [Number] INTO dbo.Numbers FROM cteN WHERE [Number] <= @UpperBound; CREATE UNIQUE CLUSTERED INDEX CIX_Number ON dbo.Numbers ([Number]) WITH (FILLFACTOR = 100); INSERT INTO dbo.SalesTrackingHistory ( SalesCode, CustomerId, OrderId, TrackingId, TrackingCount ) SELECT TOP 10000000 ABS(CHECKSUM(NEWID()) % 1000) + 10 AS SalesCode, ABS(CHECKSUM(NEWID()) % 2000) + 20 AS CustomerId, ABS(CHECKSUM(NEWID()) % 3000) + 30 AS OrderId, ABS(CHECKSUM(NEWID()) % 4000) + 40 AS TrackingId, ABS(CHECKSUM(NEWID()) % 100000) + 1000 AS TrackingCount FROM dbo.Numbers n; GO CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_SalesTrackingHistory ON dbo.SalesTrackingHistory ( CustomerId, TrackingCount ) WITH (MAXDOP = 1);
Watching the Tuple Mover Work
You know something called the tuple mover exists, but can you see it? Let's spend some time going through a typical day as the tuple mover.
First, we must give it a reason to get to work. I'm adding two million rows to our table in the query below.
INSERT INTO dbo.SalesTrackingHistory ( SalesCode, CustomerId, OrderId, TrackingId, TrackingCount ) SELECT TOP 2000000 ABS(CHECKSUM(NEWID()) % 1000) + 10 AS SalesCode, ABS(CHECKSUM(NEWID()) % 2000) + 20 AS CustomerId, ABS(CHECKSUM(NEWID()) % 3000) + 30 AS OrderId, ABS(CHECKSUM(NEWID()) % 4000) + 40 AS TrackingId, ABS(CHECKSUM(NEWID()) % 100000) + 1000 AS TrackingCount FROM dbo.Numbers n; GO
Now let's run the helpful query below to see if it leaves any clues.
SELECT row_group_id, state_desc, total_rows, deleted_rows, trim_reason_desc FROM sys.dm_db_column_store_row_group_physical_stats;
The first time we ran the query, the DMV returned 12 records, and one rowgroup was closed. SQL adds a record for our deltastore where the leftovers go. If we give the tuple mover about five minutes, it changes the CLOSED rowgroup to COMPRESSED. Once compressed, the previous rowgroup displays the TOMBSTONE state. In another five minutes, the tuple mover deletes the TOMBSTONE rowgroup.
Deleting or Updating Rows
It's time to look at what causes fragmentation in columnstore indexes: deleting or updating rows. We'll stick with deleting in this example since the concepts are identical. When you update a row in a columnstore index, SQL marks the old row as deleted and inserts the new one into the deltastore.
The query below deletes two million rows from our table.
;WITH cte_SalesTrackingHistory AS ( SELECT TOP 2000000 * FROM dbo.SalesTrackingHistory ORDER BY Id ASC ) DELETE FROM cte_SalesTrackingHistory; GO
If we look at our rowgroups again, we notice that deleted rows are zero. What's going on?
This is where the tuple mover comes into play, specifically moving the deleted buffer to the bitmap. However, this action only occurs when at least 1,048,576 rows reside in the buffer.
Using the query below, we can look at the rows in the deleted buffer and bitmap. The primary DMV is the sys.internal_partitions. Remember you must wait about five minutes for the tuple mover to wake up and move the rows from the buffer to the bitmap.
SELECT OBJECT_NAME(i.object_id) AS TableName, i.[name] AS IndexName, p.[internal_object_type_desc] AS [Description], p.[rows] AS [RowCount], p.[data_compression_desc] AS [CompressionType] FROM [sys].[internal_partitions] AS p INNER JOIN [sys].[indexes] AS i ON p.[object_id] = i.[object_id] AND p.[index_id] = i.[index_id];
If we rerun our original query after taking a coffee break, the deleted rows appear.
How do we get rid of those deleted rows? By performing either a rebuild or reorganization; the latter is my preferred method.
ALTER INDEX NCCI_SalesTrackingHistory ON dbo.SalesTrackingHistory REORGANIZE; GO
It looks like some TOMBSTONE rowgroups are back. Guess who's on deck? That's right, your trusty tuple mover.
Once again, the tuple mover is at work keeping our rowgroups nice and tidy.
What would happen if the tuple mover didn't exist? The most obvious thing is that we would need to rebuild our indexes more frequently. There is a trace flag you can use to deactivate it. However, unless Microsoft support directs you to, I strongly advise against it.
The last item I wanted to mention is that the behavior of nonclustered and clustered indexes differ. Look for any tips on feature functionality when reviewing Microsoft documentation.
When I started working with columnstore, it was slim pickings regarding the documentation available on the tuple mover. However, Niko Neugebauer's blog was a lifesaver for troubleshooting issues. How about you? Have you ever had to troubleshoot a problem with columnstore indexes?
- The tuple mover is a process for keeping your rowgroups tidy, like a well-cut golf course lawn. One of its primary duties is compressing closed rowgroups and turning them into segments.
- Remember, the tuple mover takes around five minutes to turn on and jump into action. Give it time if you don't see your deleted rows show up immediately.
- A handy DMV for reviewing your rowgroups is sys.dm_db_column_store_row_group_physical_stats. If you don't see deleted rows, there must be 1,048,576 in the buffer first.
- Would you like a brief overview of the differences between columnstore and rowstore storage? I wrote an article, SQL Server Storage Modes - Choosing Rowstore or Columnstore.
- If you want to dive deep into everything columnstore, Edward Pollack wrote a series of articles on the Redgate site.
- Fikrat Azizoz wrote an excellent article about Identifying the best tables for SQL Server 2016 Columnstore Index Migration.
About the author
View all my tips
Article Last Updated: 2023-08-14