Improve SQL Server Logging to Reduce Space and Troubleshooting Time
In the last year, we've evaluated how much information we're logging with database scripts that we run, which mostly involve updates, additions or removals of data along with some migrations and loads. The amount of logging has grown and in quite a few cases, it hasn't helped us solve problems faster when we face them. What are some techniques that we can do to reduce logging too much information, or logging information that helps us solve the problem faster when we do need to log information.
Part of the answer to this question depends on how scripts are executed in our environment: are we running one script at a time on the machine itself, executing scripts remotely through PowerShell or command line, or running scripts through an application that can catch output? It’s also possible that development and production match on some data, so a development test will match production, though there are just as many environments where these will be different, as data may relate to the specific environment (or be private information only allowed in production). The answer to this question will help us construct scripts that may be more helpful if we enter a situation where we need to quickly troubleshoot. It’s also entirely impossible that we run one or two UPDATE, INSERT or DELETE scripts a year, which means we may not require much additional logic if issues arise.
For this tip, we’ll look at situations where there may be multiple scripts, and this is a frequent task of running data scripts. Suppose that we have the following update transaction from a file that we must run to update 21 rows of our table.
UPDATE tbWllIfo SET WTrck = 'Reg 4' WHERE WKey = 'A4'
Whether this passes or fails, we won't know much about our transaction. We can add some logic to control its success or failure by enforcing a number of records that can be updated. In the below snippet, we set the number of rows allowed to be updated to 21, and cause a rollback if the transaction exceed the rows we set.
DECLARE @rwcont INT BEGIN TRAN UPDATE tbWllIfo SET WTrck = 'Reg 4' WHERE WKey = 'A4' SELECT @rwcont = @@ROWCOUNT ---- 21 is an enforcement check here, as only 21 rows should be updated IF @rwcont = 21 BEGIN PRINT 'Completed; rows updated ' + CAST(@rwcont AS VARCHAR(3)) COMMIT TRAN END ELSE BEGIN PRINT 'Update failed; rows attempted to be updated ' + CAST(@rwcont AS VARCHAR(3)) ROLLBACK TRAN END
This eliminates the likelihood that transactions runs which updates more rows than we want. However, if the script rolls back the transaction, what would be the next step we’d have to take? We’d query the table and look at the result to identify why there are more (or less) rows than we expected. In addition, if this script is part of multiple scripts being run for the day or evening, we would also want to know what script file it was. Finally, if we’re working in an environment with hundreds of servers and databases, we may want to quickly identify this information too.
DECLARE @rwcont INT BEGIN TRAN UPDATE tbWllIfo SET WTrck = 'Reg 4' WHERE WKey = 'A4' SELECT @rwcont = @@ROWCOUNT ---- 21 is an enforcement check here, as only 21 rows should be updated IF @rwcont = 21 BEGIN PRINT 'Completed; rows updated ' + CAST(@rwcont AS VARCHAR(3)) COMMIT TRAN END ELSE BEGIN BEGIN PRINT ' Transaction failed. Server.Database: ' + @@SERVERNAME + '.' + DB_NAME() + ' Query: SELECT COUNT(*) FROM tbWllIfo WHERE WKey = ''A4'' File: UPDATE_tbWllIfo.sql' ROLLBACK TRAN END
For the sake of this example, I’m using PRINT as the output command, but this could save these details to a logging table where the server, database, query and file information are in columns (looking at the ELSE clause since the remaining script is identical):
ELSE BEGIN PRINT 'Failed.' ROLLBACK TRAN INSERT INTO tbScriptErrors (ServerName,DatabaseName,Query,FileName) VALUES (@@SERVERNAME,DB_NAME(),'SELECT COUNT(*) FROM tbWllIfo WHERE WKey = ''A4''','UPDATE_tbWllIfo.sql') END
We can take this a step further and build in additional logging logic for some flows, such as logic for an ETL flow where data are inserted if enough records exist over a previous period or within a range. In this example, we'll use T-SQL IF statements like a switch structure, where based on a count of the most recent transactions into a table from a load, we execute the next steps. If the table has less than a particular number of rows (1000), we flag the table to wait before executing the next step. If the table has more than the rows that a load would carry, we flag the table to prevent the next steps. If the table has within the correct range of records, we flag the table to move to the next job.
Notice how only one parameter (exceeding 5000) leads to an email being sent - depending on what would be a cause for a concern, we would take an additional step above logging. This email step could also go to the data vender, or the appropriate team, if these data originated from a vender or team, if they are the only ones that can solve this issue (in the case of emailing the vender, we would include different information in the body). In general, emailing about a task completing may lead to emails being ignored; it's a best practice to email only when urgency is required to avoid emails from being ignored. In this example, the tbLoaderRuns table becomes our log relating to the flow of data into the table. The below image shows an example of this output.
/* ---- Table based on format of: CREATE TABLE tbLoaderRuns ( Loader VARCHAR(25), Flag BIT DEFAULT 0, FileCount INT, FlagTime SMALLDATETIME DEFAULT GETDATE() ) */ DECLARE @tacont INT SELECT @tacont = COUNT(*) FROM fileLoad1 WHERE TxTime > DATEADD(MI,-30,GETDATE()) ---- Flag table to prevent the next step: count too low, wait until count increases IF @tacont < 1000 BEGIN INSERT INTO tbLoaderRuns (Loader,Flag,FileCount) VALUES ('fileLoad1',0,@tacont) END ---- Flag table to prevent the next step: count too high ELSE IF @tacont > 5000 BEGIN PRINT 'Error: count too high. Email sent and table flagged.' INSERT INTO tbLoaderRuns (Loader,Flag,FileCount) VALUES ('fileLoad1',0,@tacont) EXEC msdb.dbo.sp_send_dbmail @profile_name='ETL', @email@example.com', @subject='Loader: fileLoad1', @body='Transaction count for last 30 minutes was above 5000. Query: SELECT @tacont = COUNT(*) FROM fileLoad1 WHERE TxTime > DATEADD(MI,-30,GETDATE())' END ---- Kick off the job step ELSE BEGIN INSERT INTO tbLoaderRuns (Loader,Flag,FileCount) VALUES ('fileLoad1',1,@tacont) EXEC msdb.dbo.sp_start_job N'Loader1 Data Transformation - 2' END
If we architect a design like this, we want to construct the table and design for re-use in addition to steps that are designed for automatically correcting the problem. Note that only one scenario results in us being notified - if the count is too many. If the count is too low, this step can be executed until the count crosses above the threshold. In addition, while the logging table would suffice for logging some information, such as logging information about the row count and what step the import the step moved to, this would not assist us if the application or SSIS package experienced a failure connecting to this server to access this table (unless everything is native to the local server). Because the table includes date, to prevent too much information from being (or staying) logged, we can remove data from this table frequently as needed, such as when a load completes, or on a schedule. We could also write a generic report at the end of each load, which summarizes the information from the load, emails the summary without any urgency (to be saved in a historic folder), and eliminates the logging information it summarized.
- Basic logging, such as the time a script ran and related information should be kept, even if we're able to auto-resolve some of the problems we may have or if we have more detailed information along with the logging so that we can quickly solve any problem we may see when we run a script. This will also help with situations where we may see an error that we haven't seen before, or we may not have an immediate explanation for and further information assists us in helping.
- Scripts will sometimes fail and outages unfortunately do happen. The purpose of this tip is to consider ways that we can troubleshoot an issue quickly to reduce the amount of time solving a problem when we do face issues. Trying to develop a query during an issue can increase the length of an outage, or length of an issue. In some environments, this may mean that we need to be even more detailed in our script and its output while in other environments, we may use less detail and output.
- One benefit of thinking about what to do if a script fails is that it raises our awareness of the script by enforcing us to consider what's being run. As we write more scripts, it's easy to sometimes overlook an obvious error (like in math forgetting a sign), so the practice of putting in validation checks and returning queries reduces the likelihood we write a script erroneously. This in addition to testing reduces errors.
- One point about email alerting with logging errors: if we aren't emailing the appropriate person, team or vender when an error occurs and we're emailing an intermediary step that will only pass along the information we're wasting another person's (or team's) time. The person or team who can solve the problem should be notified, unless we require extra steps for the solution. It makes little to no sense to receive alerts that then get passed on to others, as it creates a situation where people become transmitters when email already serves that functionality.
About the author
View all my tips
Article Last Updated: 2018-03-07