Cleanup and Enhance Data in SQL Server Dimension Tables

By: Aaron Bertrand | Updated: 2022-04-08 | Comments | Related: > Database Design

Problem

In my previous tip, I showed how we can make a growing logging table leaner by moving large, repeating strings to their own dimension tables. The solution there involved an AFTER INSERT trigger and assumed that we could change the applications to recognize the new table structure in relatively short order.

Well, we can't always make that assumption – sometimes the trigger must live there for a long time as we slowly update the application, and sometimes we can never update the application because it isn't ours, we no longer have the source, or various reasons businesses come up with to maintain technical debt.

If we are stuck with the trigger for handling inserts, it's probably fine, but the change serves little purpose if we can't get rid of the wide data in the main table – and we can't do that if we can't change the readers to join against the dimension tables.

Solution

While it's cleaner and more ideal to be able to change the application at the same time as the database, we can still make the change transparent to both readers and writes and make the best of this situation. In short, we can rename the table, rename the view we created in part 1, and replace the AFTER INSERT trigger on the table with an INSTEAD OF INSERT trigger on the view.

The original table was called ExceptionLog and, for the case where we could change the app, we had created this view:

CREATE VIEW dbo.vExceptionLog
AS
  SELECT l.LogID,
         h.Hostname,
         u.URI,
         m.ErrorMessage,
         l.EventTime
  FROM dbo.ExceptionLog AS l
  INNER JOIN dbo.ExceptionHostnames AS h
    ON l.HostnameID = h.HostnameID
  INNER JOIN dbo.ExceptionURIs AS u
    ON l.URIID = u.URIID
  INNER JOIN dbo.ExceptionErrorMessages AS m
    ON l.ErrorMessageID = m.ErrorMessageID;

We're going to do something similar here. In one transaction, we're going to:

Rename the table and drop the trigger
Create a view with the original table name
Create an instead of insert trigger on the view
Drop the wide columns from the source table

But I'm going to talk about these separately, so they won't appear in a single code block.

Rename the table

Basically, we want the app to think nothing has changed. In order to make the app see a view that represents a join between the logging table and the dimension tables, we first need to rename the table so it can be in the background (and drop the trigger from part 1, if we created it):

EXEC sys.sp_rename N'dbo.ExceptionLog', N'ExceptionLogSource', N'OBJECT';
GO

DROP TRIGGER IF EXISTS dbo.ExceptionLog_DivertDimensions;
GO

Create the view

Next, we create a view that looks like the original table, both in name and output. It accomplishes this chameleon act by joining the renamed logging table to the dimension tables:

CREATE VIEW dbo.ExceptionLog
AS
  SELECT l.LogID,
         h.Hostname,
         u.URI,
         m.ErrorMessage,
         l.EventTime
  FROM dbo.ExceptionLogSource AS l
  INNER JOIN dbo.ExceptionHostnames AS h
    ON l.HostnameID = h.HostnameID
  INNER JOIN dbo.ExceptionURIs AS u
    ON l.URIID = u.URIID
  INNER JOIN dbo.ExceptionErrorMessages AS m
    ON l.ErrorMessageID = m.ErrorMessageID;

Create an INSTEAD OF INSERT trigger

We need a trigger on the view so that the app pieces writing to the table won't skip a beat. First, it will insert the new host names, URIs, and error messages that we haven't seen yet, then it will join against all three tables to finally insert into the new ExceptionLogSource table.

CREATE TRIGGER dbo.ExceptionLog_DivertDimensions
ON dbo.ExceptionLog
INSTEAD OF INSERT
AS
BEGIN
  SET NOCOUNT ON;

  INSERT dbo.ExceptionHostnames(Hostname) 
    SELECT i.Hostname FROM inserted AS i
    WHERE NOT EXISTS (SELECT 1 FROM dbo.ExceptionHostnames
      WHERE Hostname = i.Hostname) GROUP BY i.Hostname;

  INSERT dbo.ExceptionURIs(URI) 
    SELECT i.URI FROM inserted AS i
    WHERE NOT EXISTS (SELECT 1 FROM dbo.ExceptionURIs
      WHERE URI = i.URI) GROUP BY i.URI;

  INSERT dbo.ExceptionErrorMessages(ErrorMessage) 
    SELECT i.ErrorMessage FROM inserted AS i
    WHERE NOT EXISTS (SELECT 1 FROM dbo.ExceptionErrorMessages
      WHERE ErrorMessage = i.ErrorMessage) GROUP BY i.ErrorMessage;

  INSERT dbo.ExceptionLogSource(HostnameID, URIID, ErrorMessageID, EventTime)
    SELECT h.HostnameID, u.URIID, m.ErrorMessageID, i.EventTime
    FROM inserted AS i
    INNER JOIN dbo.ExceptionHostnames     AS h ON i.Hostname     = h.Hostname
    INNER JOIN dbo.ExceptionURIs          AS u ON i.URI          = u.URI
    INNER JOIN dbo.ExceptionErrorMessages AS m ON i.ErrorMessage = m.ErrorMessage;
END
GO

We can confirm that a select and insert continue to work:

SELECT TOP (1) LogID, Hostname, URI, ErrorMessage
  FROM dbo.ExceptionLog ORDER BY LogID DESC;

INSERT dbo.ExceptionLog(Hostname, URI, ErrorMessage) 
  VALUES(N'blip-blop', N'/go/away/', N'testing123');

SELECT TOP (1) LogID, Hostname, URI, ErrorMessage
  FROM dbo.ExceptionLog ORDER BY LogID DESC;

Results:

LogID     Hostname    URI           ErrorMessage
-------   ---------   -----------   -----------------
1000002   My Laptop   /fake/path/   404 for fake path

LogID     Hostname    URI           ErrorMessage
-------   ---------   -----------   -----------------
1000003   blip-blop   /go/away/     testing123

The insert succeeds just like before the change, and the output looks the same to the app as well.

Drop the wide columns

Now we are free to drop the wide columns from the old table (though, as mentioned in the first part, we may have to deal with other cleanup first):

ALTER TABLE dbo.ExceptionLogSource DROP COLUMN URI, Hostname, ErrorMessage;

And then we may need to rebuild to fully reclaim all the space:

ALTER TABLE dbo.ExceptionLogSource REBUILD;

If you can't perform index maintenance as a part of this operation, just keep in mind you might not see all of the impact of dropping the columns until you can.

What is the impact?

Recall that our original table was a little over 350 MB:

rows      reserved    data 
-------   ---------   ---------
1000000   351368 KB   335920 KB

Now we can look at the space of our four tables and compare:

EXEC sys.sp_spaceused @objname = N'dbo.ExceptionLogSource';
EXEC sys.sp_spaceused @objname = N'dbo.ExceptionHostnames';
EXEC sys.sp_spaceused @objname = N'dbo.ExceptionURIs';
EXEC sys.sp_spaceused @objname = N'dbo.ExceptionErrorMessages';

Combined and abridged results:

name                     rows      reserved   data
----------------------   -------   --------   --------
ExceptionLogSource       1000005   46664 KB   46512 KB 
ExceptionHostnames            53      72 KB       8 KB
ExceptionURIs               2512     328 KB     176 KB
ExceptionErrorMessages      7293    1864 KB    1720 KB
                                   --------   --------
                                   48928 KB   48416 KB

That is a reduction of about 300 MB (an 86% space reduction). Now picture the impact of that savings if our logging table is actually 300 GB, or 3 TB, or 30 TB (yes, I've seen it).

This still isn't perfect

Like we talked about in part 1, leaving the trigger there indefinitely (to avoid any changes to the application) is not ideal. Triggers are not very discoverable, and they are certainly not known to help performance. Therefore, it will still be a good idea to plan to create a stored procedure (or update the app's ad hoc query) to handle the dimension logic before the insert without the use of a trigger. This solution at least slows the space growth for now and buys you time to plan that broader change.

Next Steps

Identify places where you are writing identical strings over and over and consider normalizing those repeating values out into their own tables.
See these tips and other resources:

About the author

Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2022-04-08

Populate dimension tables dynamically in SQL Server - Part 2