Populate dimension tables dynamically in SQL Server - Part 2
In my previous tip, I showed how we can make a growing logging table leaner by moving large, repeating strings to their own dimension tables. The solution there involved an AFTER INSERT trigger and assumed that we could change the applications to recognize the new table structure in relatively short order.
Well, we can't always make that assumption – sometimes the trigger must live there for a long time as we slowly update the application, and sometimes we can never update the application because it isn't ours, we no longer have the source, or various reasons businesses come up with to maintain technical debt.
If we are stuck with the trigger for handling inserts, it's probably fine, but the change serves little purpose if we can't get rid of the wide data in the main table – and we can't do that if we can't change the readers to join against the dimension tables.
While it's cleaner and more ideal to be able to change the application at the same time as the database, we can still make the change transparent to both readers and writes and make the best of this situation. In short, we can rename the table, rename the view we created in part 1, and replace the AFTER INSERT trigger on the table with an INSTEAD OF INSERT trigger on the view.
The original table was called
for the case where we could change the app, we had created this view:
CREATE VIEW dbo.vExceptionLog AS SELECT l.LogID, h.Hostname, u.URI, m.ErrorMessage, l.EventTime FROM dbo.ExceptionLog AS l INNER JOIN dbo.ExceptionHostnames AS h ON l.HostnameID = h.HostnameID INNER JOIN dbo.ExceptionURIs AS u ON l.URIID = u.URIID INNER JOIN dbo.ExceptionErrorMessages AS m ON l.ErrorMessageID = m.ErrorMessageID;
We're going to do something similar here. In one transaction, we're going to:
- Rename the table and drop the trigger
- Create a view with the original table name
- Create an instead of insert trigger on the view
- Drop the wide columns from the source table
But I'm going to talk about these separately, so they won't appear in a single code block.
Rename the table
Basically, we want the app to think nothing has changed. In order to make the app see a view that represents a join between the logging table and the dimension tables, we first need to rename the table so it can be in the background (and drop the trigger from part 1, if we created it):
EXEC sys.sp_rename N'dbo.ExceptionLog', N'ExceptionLogSource', N'OBJECT'; GO DROP TRIGGER IF EXISTS dbo.ExceptionLog_DivertDimensions; GO
Create the view
Next, we create a view that looks like the original table, both in name and output. It accomplishes this chameleon act by joining the renamed logging table to the dimension tables:
CREATE VIEW dbo.ExceptionLog AS SELECT l.LogID, h.Hostname, u.URI, m.ErrorMessage, l.EventTime FROM dbo.ExceptionLogSource AS l INNER JOIN dbo.ExceptionHostnames AS h ON l.HostnameID = h.HostnameID INNER JOIN dbo.ExceptionURIs AS u ON l.URIID = u.URIID INNER JOIN dbo.ExceptionErrorMessages AS m ON l.ErrorMessageID = m.ErrorMessageID;
Create an INSTEAD OF INSERT trigger
We need a trigger on the view so that the app pieces writing to the table won't
skip a beat. First, it will insert the new host names, URIs, and error messages
that we haven't seen yet, then it will join against all three tables to finally
insert into the new
CREATE TRIGGER dbo.ExceptionLog_DivertDimensions ON dbo.ExceptionLog INSTEAD OF INSERT AS BEGIN SET NOCOUNT ON; INSERT dbo.ExceptionHostnames(Hostname) SELECT i.Hostname FROM inserted AS i WHERE NOT EXISTS (SELECT 1 FROM dbo.ExceptionHostnames WHERE Hostname = i.Hostname) GROUP BY i.Hostname; INSERT dbo.ExceptionURIs(URI) SELECT i.URI FROM inserted AS i WHERE NOT EXISTS (SELECT 1 FROM dbo.ExceptionURIs WHERE URI = i.URI) GROUP BY i.URI; INSERT dbo.ExceptionErrorMessages(ErrorMessage) SELECT i.ErrorMessage FROM inserted AS i WHERE NOT EXISTS (SELECT 1 FROM dbo.ExceptionErrorMessages WHERE ErrorMessage = i.ErrorMessage) GROUP BY i.ErrorMessage; INSERT dbo.ExceptionLogSource(HostnameID, URIID, ErrorMessageID, EventTime) SELECT h.HostnameID, u.URIID, m.ErrorMessageID, i.EventTime FROM inserted AS i INNER JOIN dbo.ExceptionHostnames AS h ON i.Hostname = h.Hostname INNER JOIN dbo.ExceptionURIs AS u ON i.URI = u.URI INNER JOIN dbo.ExceptionErrorMessages AS m ON i.ErrorMessage = m.ErrorMessage; END GO
We can confirm that a select and insert continue to work:
SELECT TOP (1) LogID, Hostname, URI, ErrorMessage FROM dbo.ExceptionLog ORDER BY LogID DESC; INSERT dbo.ExceptionLog(Hostname, URI, ErrorMessage) VALUES(N'blip-blop', N'/go/away/', N'testing123'); SELECT TOP (1) LogID, Hostname, URI, ErrorMessage FROM dbo.ExceptionLog ORDER BY LogID DESC;
LogID Hostname URI ErrorMessage ------- --------- ----------- ----------------- 1000002 My Laptop /fake/path/ 404 for fake path LogID Hostname URI ErrorMessage ------- --------- ----------- ----------------- 1000003 blip-blop /go/away/ testing123
The insert succeeds just like before the change, and the output looks the same to the app as well.
Drop the wide columns
Now we are free to drop the wide columns from the old table (though, as mentioned in the first part, we may have to deal with other cleanup first):
ALTER TABLE dbo.ExceptionLogSource DROP COLUMN URI, Hostname, ErrorMessage;
And then we may need to rebuild to fully reclaim all the space:
ALTER TABLE dbo.ExceptionLogSource REBUILD;
If you can't perform index maintenance as a part of this operation, just keep in mind you might not see all of the impact of dropping the columns until you can.
What is the impact?
Recall that our original table was a little over 350 MB:
rows reserved data ------- --------- --------- 1000000 351368 KB 335920 KB
Now we can look at the space of our four tables and compare:
EXEC sys.sp_spaceused @objname = N'dbo.ExceptionLogSource'; EXEC sys.sp_spaceused @objname = N'dbo.ExceptionHostnames'; EXEC sys.sp_spaceused @objname = N'dbo.ExceptionURIs'; EXEC sys.sp_spaceused @objname = N'dbo.ExceptionErrorMessages';
Combined and abridged results:
name rows reserved data ---------------------- ------- -------- -------- ExceptionLogSource 1000005 46664 KB 46512 KB ExceptionHostnames 53 72 KB 8 KB ExceptionURIs 2512 328 KB 176 KB ExceptionErrorMessages 7293 1864 KB 1720 KB -------- -------- 48928 KB 48416 KB
That is a reduction of about 300 MB (an 86% space reduction). Now picture the impact of that savings if our logging table is actually 300 GB, or 3 TB, or 30 TB (yes, I've seen it).
This still isn't perfect
Like we talked about in part 1, leaving the trigger there indefinitely (to avoid any changes to the application) is not ideal. Triggers are not very discoverable, and they are certainly not known to help performance. Therefore, it will still be a good idea to plan to create a stored procedure (or update the app's ad hoc query) to handle the dimension logic before the insert without the use of a trigger. This solution at least slows the space growth for now and buys you time to plan that broader change.
- Identify places where you are writing identical strings over and over and consider normalizing those repeating values out into their own tables.
- See these tips and other resources:
- Populate dimension tables dynamically in SQL Server - Part 1
- Storing E-mail addresses more efficiently in SQL Server
- Storing E-mail addresses more efficiently in SQL Server - Part 2
- Make your SQL Server database changes backward compatible when adding a new column
- Make your SQL Server database changes backward compatible when dropping a column
- Make your SQL Server database changes backward compatible when renaming an entity
- Make your SQL Server database changes backward compatible when changing a relationship
- Making SQL Server Function Changes Backwards Compatible
About the author
View all my tips
Article Last Updated: 2022-04-08