Denormalize For Performance
By: Ben Snaidero
Even though it might mean storing a bit of redundant data, schema denormalization can sometimes provide better query performance. The only question then becomes is the extra space used worth the performance benefit.
Before we get started on the example let's make sure our join columns are indexed (as we saw in an earlier topic) so the performance results are not skewed by any scans. Here are the SQL statements to create these indexes.
CREATE NONCLUSTERED INDEX idxChild_ParentID ON [dbo].[Child] ([ParentID]) CREATE NONCLUSTERED INDEX idxChildDetail_ChildID ON [dbo].[ChildDetail] ([ChildID])
To test the performance of both our normalized and denormalized schemas we'll use the following simple 3 table join query.
SELECT * FROM [dbo].[Parent] P INNER JOIN [dbo].[Child] C ON P.ParentID=C.ParentID INNER JOIN [dbo].[ChildDetail] CD ON C.ChildID=CD.ChildID WHERE P.ParentID=32433
Looking at the explain plan for this query it behaves just as we would suspect, using index seeks and lookups as it joins each table in the query.
Now let's do some denormalization by moving the ChildDetail table data into the Child table. We'll first need to add the required columns to the Child table. Then before we can migrate any data we'll need to remove the primary and foreign key constraints and once the data is migrated we can recreate them. The following SQL statements perform these tasks.
ALTER TABLE [dbo].[Child] ADD [ChildDetailID] [bigint] NOT NULL DEFAULT 0,[ExtraDataColumn] [bigint] ALTER TABLE [dbo].[ChildDetail] DROP CONSTRAINT [FK_ChildDetail_Child] ALTER TABLE [dbo].[Child] DROP CONSTRAINT [PK_Child] INSERT INTO [dbo].[Child] SELECT C.ChildID,C.ParentID,C.IntDataColumn,C.VarcharDataColumn, CD.ChildDetailID,CD.ExtraDataColumn FROM [dbo].[Child] C INNER JOIN [dbo].[ChildDetail] CD ON C.ChildID=CD.ChildID DELETE FROM Child where ChildDetailID=0 ALTER TABLE [dbo].[Child] ADD CONSTRAINT [PK_Child] PRIMARY KEY CLUSTERED ([ChildID] ASC, [ChildDetailID] ASC) DROP TABLE [dbo].[ChildDetail]
We'll also need to update the query used above as we no longer need to access the ChildDetail table. Here is the updated query.
SELECT * FROM [dbo].[Parent] P INNER JOIN [dbo].[Child] C ON P.ParentID=C.ParentID WHERE P.ParentID=32433
The explain plan for this query looks very similar to the original one just we no longer have the join with the ChildDetail table.
Looking at the SQL Profiler results from these two queries we do see a pretty big benefit from removing the join of the ChildDetail table. SQL Server performed fewer reads and the total execution time was also improved.
We should also take a look at how much extra space we are using as this is important in deciding whether or not to implement this type of change. The following SQL statement will tell you the amount of disk space each of your tables is consuming.
SELECT o.name,SUM(reserved_page_count) * 8.0 / 1024 AS 'Size (MB)' FROM sys.dm_db_partition_stats ddps INNER JOIN sys.objects o ON ddps.object_id=o.object_id WHERE o.name in ('Parent','Child','ChildDetail') GROUP BY o.name
The following table shows the results of the above query for both the normalized and denormalized table schemas. As we can see the denormalized table schema does use about 18MB more disk space. The only question now becomes, is the performance benefit worth the space this redundant data is holding.
|Table||Normalized Size (MB)||Denormalized Size (MB)|