Unraveling the Tuple Mover for SQL Server Columnstore Indexes

By:   |   Updated: 2023-08-14   |   Comments   |   Related: > Indexing


Problem

Have you heard someone talk about columnstore indexes and mention the tuple mover? Your first thought might be what the heck is that, but they've been talking too long already, so you don't want to ask them. This article explores a little-known background process used with columnstore. It's mysterious and only periodically emerges from hiding like the legendary Loch Ness Monster or Bigfoot. However, unlike Nessy, the tuple mover and Bigfoot are real. In all seriousness, a firm grasp of the tuple mover helps when you troubleshoot columnstore index issues.

Solution

This article explores the three primary tasks the tuple mover performs. If you plan to work with columnstore, understanding it will be invaluable. A couple of the questions I'll answer as we go through are what precisely the tuple mover is and why you should care. Information on the tuple mover is floating around online, but I aim to bring it into one place. Having prior knowledge of columnstore indexes will be helpful as you go through this. By the end of this article, you'll better understand a tiny but mighty SQL Server process.

What is the Tuple Mover?

Microsoft first introduced the tuple mover alongside columnstore indexes in SQL Server 2012. The simplest explanation is that it's a background process that performs maintenance on rowgroups. Whenever I think about the tuple mover, an image of a robot cleaning up after me comes to mind. Picture a Roomba but cooler. The tuple mover operates on clustered and nonclustered indexes (we'll focus on the latter). Also, most information applies to SQL Server 2016 and onward.

What Does It Do?

Do you have a hard time describing what your job entails? My story typically comes out as I work on computers. I imagine the tuple mover feels the same way when hanging out with all the other background processes. Columnstore, like any other index type, endures wear and tear from inserts, updates, and deletes. The tuple mover is here to help keep our columnstore indexes tidy. Let's go through each of the major services it provides.

Compresses Closed Rowgroups

When a rowgroup reaches 1,048,576 rows, the state changes from OPEN to CLOSED, meaning no more rows; this one is full. The tuple mover is on the lookout for CLOSED rowgroups, and once it finds them, it performs coding and compression, turning them into segments.

Removes Tombstone Rowgroups

If you delete all the rows in a rowgroup, SQL changes the state to TOMBSTONE. Additionally, when a CLOSED rowgroup switches to compressed, the original one is marked as TOMBSTONE. The tuple mover is on the lookout for these and removes them. The removal process frees up space.

Transfers Records from Buffer to the Bitmap Table

When someone deletes records from rowgroups, SQL marks them for deletion, but they are still part of the columnstore index. When the row number reaches the 1,048,576 threshold, the tuple mover wakes up and moves the records from the delete buffer to the delete bitmap table. The delete bitmap is page compressed, but the bitmap is not. Storing records in the bitmap table leads to a smaller index size.

Building the Dataset

Let's build a dataset to watch the tuple mover in action. Since I primarily work with nonclustered columnstore indexes, we'll use one here. I'm creating one table with ten million rows and eight columns with the script below. This number produces ten rowgroups, nine filled to the brim and one residual.

/* MSSQLTips.com */

USE [master];
GO

IF DATABASEPROPERTYEX(   'TupleMoverDemo',
                         'Version'
                     ) IS NOT NULL
BEGIN
    ALTER DATABASE TupleMoverDemo SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE TupleMoverDemo;
END;
GO

CREATE DATABASE TupleMoverDemo;
GO

ALTER DATABASE TupleMoverDemo SET RECOVERY SIMPLE;
GO

USE TupleMoverDemo;
GO

CREATE TABLE dbo.SalesTrackingHistory
(
    Id INT IDENTITY(1, 1),
    SalesCode INT,
    CustomerId INT,
    OrderId INT,
    TrackingId INT,
    TrackingCount INT,
    CONSTRAINT PK_SalesTrackingHistory_ID
        PRIMARY KEY CLUSTERED (Id)
);
GO


DECLARE @UpperBound INT = 15000000;
;WITH cteN (Number)
AS (
   SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
   FROM sys.all_columns AS s1
       CROSS JOIN sys.all_columns AS s2
   )
SELECT [Number]
INTO dbo.Numbers
FROM cteN
WHERE [Number] <= @UpperBound;

CREATE UNIQUE CLUSTERED INDEX CIX_Number
ON dbo.Numbers ([Number])
WITH (FILLFACTOR = 100);


INSERT INTO dbo.SalesTrackingHistory
(
    SalesCode,
    CustomerId,
    OrderId,
    TrackingId,
    TrackingCount
)
SELECT TOP 10000000
         ABS(CHECKSUM(NEWID()) % 1000) + 10 AS SalesCode,
        ABS(CHECKSUM(NEWID()) % 2000) + 20 AS CustomerId,
        ABS(CHECKSUM(NEWID()) % 3000) + 30 AS OrderId,
        ABS(CHECKSUM(NEWID()) % 4000) + 40 AS TrackingId,
        ABS(CHECKSUM(NEWID()) % 100000) + 1000 AS TrackingCount
FROM dbo.Numbers n;
GO


CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_SalesTrackingHistory
ON dbo.SalesTrackingHistory (
                                CustomerId,
                                TrackingCount
                            )
WITH (MAXDOP = 1);
Rowgroup results.

Watching the Tuple Mover Work

You know something called the tuple mover exists, but can you see it? Let's spend some time going through a typical day as the tuple mover.

Inserting Rows

First, we must give it a reason to get to work. I'm adding two million rows to our table in the query below.

INSERT INTO dbo.SalesTrackingHistory
(
    SalesCode,
    CustomerId,
    OrderId,
    TrackingId,
    TrackingCount
)
SELECT TOP 2000000
         ABS(CHECKSUM(NEWID()) % 1000) + 10 AS SalesCode,
        ABS(CHECKSUM(NEWID()) % 2000) + 20 AS CustomerId,
        ABS(CHECKSUM(NEWID()) % 3000) + 30 AS OrderId,
        ABS(CHECKSUM(NEWID()) % 4000) + 40 AS TrackingId,
        ABS(CHECKSUM(NEWID()) % 100000) + 1000 AS TrackingCount
FROM dbo.Numbers n;
GO

Now let's run the helpful query below to see if it leaves any clues.

SELECT row_group_id,
       state_desc,
       total_rows,
       deleted_rows,
       trim_reason_desc
FROM sys.dm_db_column_store_row_group_physical_stats;
Rowgroup with a Closed State
Rowgroup with a Tombstone state

The first time we ran the query, the DMV returned 12 records, and one rowgroup was closed. SQL adds a record for our deltastore where the leftovers go. If we give the tuple mover about five minutes, it changes the CLOSED rowgroup to COMPRESSED. Once compressed, the previous rowgroup displays the TOMBSTONE state. In another five minutes, the tuple mover deletes the TOMBSTONE rowgroup.

Deleting or Updating Rows

It's time to look at what causes fragmentation in columnstore indexes: deleting or updating rows. We'll stick with deleting in this example since the concepts are identical. When you update a row in a columnstore index, SQL marks the old row as deleted and inserts the new one into the deltastore.

The query below deletes two million rows from our table.

;WITH cte_SalesTrackingHistory
AS (
   SELECT TOP 2000000
          *
   FROM dbo.SalesTrackingHistory
   ORDER BY Id ASC
   )
DELETE FROM cte_SalesTrackingHistory;
GO

If we look at our rowgroups again, we notice that deleted rows are zero. What's going on?

Rowgroups missing deleted rows.

This is where the tuple mover comes into play, specifically moving the deleted buffer to the bitmap. However, this action only occurs when at least 1,048,576 rows reside in the buffer.

Using the query below, we can look at the rows in the deleted buffer and bitmap. The primary DMV is the sys.internal_partitions. Remember you must wait about five minutes for the tuple mover to wake up and move the rows from the buffer to the bitmap.

SELECT OBJECT_NAME(i.object_id) AS TableName,
       i.[name] AS IndexName,
       p.[internal_object_type_desc] AS [Description],
       p.[rows] AS [RowCount],
       p.[data_compression_desc] AS [CompressionType]
FROM [sys].[internal_partitions] AS p
    INNER JOIN [sys].[indexes] AS i
        ON p.[object_id] = i.[object_id]
           AND p.[index_id] = i.[index_id];
Reviewing the deleted buffer.

If we rerun our original query after taking a coffee break, the deleted rows appear.

Rowgroup with deleted rows.

How do we get rid of those deleted rows? By performing either a rebuild or reorganization; the latter is my preferred method.

ALTER INDEX NCCI_SalesTrackingHistory
ON dbo.SalesTrackingHistory
REORGANIZE;
GO
The tuple mover cleaning up rowgroups.

It looks like some TOMBSTONE rowgroups are back. Guess who's on deck? That's right, your trusty tuple mover.

A nice clean set of rowgroups.

Once again, the tuple mover is at work keeping our rowgroups nice and tidy.

Closing Thoughts

What would happen if the tuple mover didn't exist? The most obvious thing is that we would need to rebuild our indexes more frequently. There is a trace flag you can use to deactivate it. However, unless Microsoft support directs you to, I strongly advise against it.

The last item I wanted to mention is that the behavior of nonclustered and clustered indexes differ. Look for any tips on feature functionality when reviewing Microsoft documentation.

When I started working with columnstore, it was slim pickings regarding the documentation available on the tuple mover. However, Niko Neugebauer's blog was a lifesaver for troubleshooting issues. How about you? Have you ever had to troubleshoot a problem with columnstore indexes?

Key Takeaways

  • The tuple mover is a process for keeping your rowgroups tidy, like a well-cut golf course lawn. One of its primary duties is compressing closed rowgroups and turning them into segments.
  • Remember, the tuple mover takes around five minutes to turn on and jump into action. Give it time if you don't see your deleted rows show up immediately.
  • A handy DMV for reviewing your rowgroups is sys.dm_db_column_store_row_group_physical_stats. If you don't see deleted rows, there must be 1,048,576 in the buffer first.
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Jared Westover Jared Westover is a passionate technology specialist at Crowe, helping to build data solutions. For the past two decades, he has focused on SQL Server, Azure, Power BI, and Oracle. Besides writing for MSSQLTips.com, he has published 12 Pluralsight courses about SQL Server and Reporting Services. Jared enjoys reading and spending time with his wife and three sons when he's not trying to make queries go faster.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-08-14

Comments For This Article

















get free sql tips
agree to terms