What Happens When a Computed Column is Persisted in SQL Server?
In my last tip, I showed how you could automate the process of making a set of computed columns persisted. But what really happens under the covers when you do so?
There are a couple of things we can look at when we want to determine "what happens" when a column becomes persisted:
- Does the page structure change?
- What if there is an index on the computed column, or one is added after it is persisted? What does the index page look like before and after?
First, we have a simple table in a database we'll create just for this test:
CREATE DATABASE Persistent; GO USE Persistent; GO CREATE TABLE dbo.SimpleTable1 ( id INT IDENTITY(1,1) PRIMARY KEY, comp AS (100-id) ); INSERT dbo.SimpleTable1 DEFAULT VALUES; GO
Now, to see what the page looks like, we can use the semi-documented and absolutely unsupported
DBCC IND and
DBCC PAGE commands (recommendation: do not do this in a production database). To investigate the output of the latter, we'll also need to turn on trace flag 3604. First, let's find the pages that are represented by our clustered index (IndexID 1):
DBCC IND(N'Persistent', N'dbo.SimpleTable1', 1);
All we need is the page where type = 1 (data page). In my case (yours will most certainly be slightly different), partial results were:
PageFID PagePID IAMFID IAMPID ObjectID IndexID ... PageType 1 304 NULL NULL 373576369 1 ... 10 1 287 1 304 373576369 1 ... 1
So the page we're after is 287 - we can feed that to
DBCC PAGE and use option 3 (which includes the page header as well as individual row information):
DBCC TRACEON(3604); DBCC PAGE(N'Persistent', 1, 287, 3); -- ^ ^^^ ^ -- | ||| | ----- fileID (PageFID) --- ||| | -- ||| | ------------- PagePID ----------- | -- | -- page header + per-row info --------
The relevant part of the output:
As you can see, there is no information at all about the computed column. It should not be surprising, though, that if we make the column
ALTER TABLE dbo.SimpleTable1 ALTER COLUMN [comp] ADD PERSISTED;
...the page is directly and immediately changed, with new data for the persisted column (again, highlighting just the row-specific information in the output):
Now, what if the computed column is indexed? Let's take a look at the data page *and* the index page, in the case where the index already exists and the column is not computed:
CREATE TABLE dbo.SimpleTable2 ( id INT IDENTITY(1,1) PRIMARY KEY, comp AS (100-id) ); INSERT dbo.SimpleTable2 DEFAULT VALUES; GO CREATE INDEX x ON dbo.SimpleTable2(comp); GO
I'll skip the
DBCC IND work this time, and get right to the
DBCC PAGE output for the data page, which looks remarkably similar to the one above:
And for the index page, which has no stored information about the computed value:
Now, if we persist the column, then repeat those operations, we see again that the data page has the additional 4 bytes per row...
...and the index page has not changed significantly at all (see if you can spot the change - a hint: it's not actually stored on the page, just part of the output):
In the case where the index is created after the column is persisted, the net change is the same (except, of course, we can't inspect the index page after we've persisted the column, but before we create the index).
So, in this simple case, it looks like there is always an intrusive change to the data page, and no changes to the index pages, regardless of whether the column is persisted before or after an index on that column is created.
Like last time, we are barely scratching the surface here; I stuck to pretty simple computed columns with deterministic formulas and single-column indexes. For example, I did not get into computed columns that reference user-defined functions, or multi-column indexes, or computed columns that are included in an index, or when the table is actually a heap. What happens to the underlying data structures when those computed columns are persisted and/or indexed, I'll leave as an exercise to the reader.
For these simple cases, it is clear that some of the pages get touched immediately when a column is marked as persisted, as the data is actually placed onto disk within the relative row of the clustered index. You can envision that this is likely to introduce delays on larger tables due to data movement and logging, and can probably be a major source for page splits as the data is moved around, depending on things such as fill factor and existing fragmentation.
In my next tip, I'll look at the impact of this operation on larger tables, both with and without indexes on the computed column; for example:
- Does the operation cause a significant increase in pages or a large number of page splits?
- Does the operation create fragmentation?
- Does the operation block other queries?
- Does the operation generate significant log activity?
- Does it make more sense to create an index *after* a computed column has been persisted?
- Are these observations consistent when a persisted computed column is later made to be *not* persisted?
- Review the following tips and other resources:
- Change All Computed Columns to Persisted in SQL Server
- Using Computed Columns in SQL Server with Persisted Values
- How to create indexes on computed columns in SQL Server
- Getting creative with Computed Columns in SQL Server
- Deterministic and Nondeterministic Functions (MSDN)
- COLUMNPROPERTY (Transact-SQL) (MSDN)
- Indexes on Computed Columns (MSDN)
- How to use DBCC PAGE (Paul Randal : MSDN Blogs)
- More undocumented fun: DBCC IND, DBCC PAGE, and off-row columns (Paul Randal : MSDN Blogs)
Last Updated: 2014-09-24
About the author
View all my tips