Denormalize For Performance


By:
Overview

Even though it might mean storing a bit of redundant data, schema denormalization can sometimes provide better query performance. The only question then becomes is the extra space used worth the performance benefit.

Explanation

Before we get started on the example let's make sure our join columns are indexed (as we saw in an earlier topic) so the performance results are not skewed by any scans. Here are the SQL statements to create these indexes.

CREATE NONCLUSTERED INDEX idxChild_ParentID
ON [dbo].[Child] ([ParentID])

CREATE NONCLUSTERED INDEX idxChildDetail_ChildID
ON [dbo].[ChildDetail] ([ChildID])

To test the performance of both our normalized and denormalized schemas we'll use the following simple 3 table join query.

SELECT * 
  FROM [dbo].[Parent] P INNER JOIN
       [dbo].[Child] C ON P.ParentID=C.ParentID INNER JOIN
       [dbo].[ChildDetail] CD ON C.ChildID=CD.ChildID
WHERE P.ParentID=32433

Looking at the explain plan for this query it behaves just as we would suspect, using index seeks and lookups as it joins each table in the query.

Explain Plan - Normalized

Now let's do some denormalization by moving the ChildDetail table data into the Child table. We'll first need to add the required columns to the Child table. Then before we can migrate any data we'll need to remove the primary and foreign key constraints and once the data is migrated we can recreate them. The following SQL statements perform these tasks.

ALTER TABLE [dbo].[Child] ADD [ChildDetailID] [bigint] NOT NULL DEFAULT 0,[ExtraDataColumn] [bigint]

ALTER TABLE [dbo].[ChildDetail] DROP CONSTRAINT [FK_ChildDetail_Child]

ALTER TABLE [dbo].[Child] DROP CONSTRAINT [PK_Child]

INSERT INTO [dbo].[Child] 
   SELECT C.ChildID,C.ParentID,C.IntDataColumn,C.VarcharDataColumn,
          CD.ChildDetailID,CD.ExtraDataColumn
    FROM [dbo].[Child] C INNER JOIN 
         [dbo].[ChildDetail] CD ON C.ChildID=CD.ChildID

DELETE FROM Child where ChildDetailID=0

ALTER TABLE [dbo].[Child] ADD  CONSTRAINT [PK_Child] PRIMARY KEY CLUSTERED 
   ([ChildID] ASC, [ChildDetailID] ASC)

DROP TABLE [dbo].[ChildDetail]

We'll also need to update the query used above as we no longer need to access the ChildDetail table. Here is the updated query.

SELECT * 
  FROM [dbo].[Parent] P INNER JOIN
       [dbo].[Child] C ON P.ParentID=C.ParentID
 WHERE P.ParentID=32433

The explain plan for this query looks very similar to the original one just we no longer have the join with the ChildDetail table.

Explain Plan - Denormalized

Looking at the SQL Profiler results from these two queries we do see a pretty big benefit from removing the join of the ChildDetail table. SQL Server performed fewer reads and the total execution time was also improved.

CPU Reads Writes Duration
Normalized 0 365 0 75
Denormalized 0 250 0 5

We should also take a look at how much extra space we are using as this is important in deciding whether or not to implement this type of change. The following SQL statement will tell you the amount of disk space each of your tables is consuming.

SELECT o.name,SUM(reserved_page_count) * 8.0 / 1024 AS 'Size (MB)'
  FROM sys.dm_db_partition_stats ddps INNER JOIN
       sys.objects o ON ddps.object_id=o.object_id
 WHERE o.name in ('Parent','Child','ChildDetail')
 GROUP BY o.name

The following table shows the results of the above query for both the normalized and denormalized table schemas. As we can see the denormalized table schema does use about 18MB more disk space. The only question now becomes, is the performance benefit worth the space this redundant data is holding.

Table Normalized Size (MB) Denormalized Size (MB)
Parent 5.9 5.9
Child 151.6 679.2
ChildDetail 509.6 N/A
Total 667.1 685.1
Additional Information





Comments For This Article

















get free sql tips
agree to terms