SQL Server Data Import System to Alert For Missed Imports
We discovered that one of our SQL Server import processes (or loading process) hadn't been importing data for over a month. After looking at it, the vendor changed the schedule without notifying us, so since the new data weren't detected, the process was executing without sending a notification. Since we have numerous import processes, how can we set up a system that allows automatic notification if a process doesn't import new data into SQL Server?
When importing data, we always will have a maximum time between imports, unless we import the data type one time and never a second time. An import process reaches the maximum time when the date has exceeded the time of the schedule; for instance, a daily import would have reached its maximum time after twenty-four hours and one minute - though in certain environments this may be simply measured by reaching the second day without an import.
In certain environments, we may have patterns of this maximum time, such as a daily import, and we also may have imports that fall outside of this pattern, such as a monthly or a quarterly import. I manage over thirteen hundred import processes for a broad range of clients, from energy to finance to non-profits, and these can significantly differ. One import can be annual (adjusted water use), and another import can be at certain times of the day, only on certain days a week (Hong Kong stock exchange).
For differing times of imports, we have a couple of options to handle this:
- For each import process, notify if the maximum time has been exceeded. This option offers advantages until we have many import processes (i.e.: over a hundred) because if the schedule changes for an import, we have to update the code.
- Each import process follows a maximum time between import schedule set in a configuration table. In an environment with many import processes, if the schedule changes, we can simply update the table with the new time.
Let's first look at an example of handling an import process by itself and the pros and cons of doing so:
CREATE PROCEDURE stp_Load_CSV @cnt SMALLINT, @file VARCHAR(250), @table VARCHAR(250) AS BEGIN IF @cnt > 0 BEGIN -- Bulk insert; see http://www.mssqltips.com/sqlservertip/3208/automating-flat-file-sql-server-imports-with-powershell/ EXEC stp_CommaBulkInsert @file, @table END ELSE BEGIN DECLARE @m INT SELECT @m = MAX(loadDate) FROM OurTable ---- If the schedule changes, this must change IF @m > 31 BEGIN EXEC msdb.dbo.sp_send_dbmail @recipients = '[email protected]' , @from_address = '[email protected]' , @subject = 'ExampleLoader' , @body = 'ExampleLoader has exceeded 31 days.' END END END
In the above procedure, a process obtains the count of files needing to be loaded (@cnt), and loads them, using the stp_CommaBulkInsert provided in this tip to load them. If the count was 0 (meaning there were no files to load), it would hit the ELSE part of the IF clause and check the last date from the table. We know from looking at this that if the time exceeded 31 days it will send an email alert.
The pros to this approach:
- This would serve a smaller environment, or an environment with a few import processes, well. If we only have two import processes, this works even if it doesn't scale.
- Some import processes don't follow a day-to-day, or hour-to-hour rule, such as a quarterly import, where we can measure every 93 days. For instance, the Hong Kong stock exchange isn't open every business day, for some securities it closes for lunch, and it's not open on the weekends and holidays. Unless we create our own schedule, such as a weekly or daily import, it doesn't necessarily follow the rules and our maximum time between imports might not be as accurate as we'd like. This approach, such as coding per import, works well with an exceptional import process (though we'd have to add a bit of logic).
- For large environments, or environments growing, this offers more challenges when schedules change.
- We lose more time training new people (for applicable environments) because of the questions involving the time of the import process, updating it, and when they should know something is wrong.
- Even for those "exceptional" import processes (such as the Hong Kong stock exchange), there may still be a maximum time that we want to be notified. For instance, when I check the holiday schedule of the Hong Kong stock exchange and consider weekends, I know that if it's been 5 days since a load, more than likely, a problem exists.
We can build another solution to this problem that records each loader and its schedule in a table, as well as other information that we may want to store. In this table, we can keep records of the most recent import time and update this time each time we import data into the table. Let's look at the table structure and then the UPDATE script we would add to each loader:
---- In this table example, we are using day-level loaders, meaning that the LoaderTimeMax will be measured ---- in days. If the majority of our loaders should be measured in hours, we can measure it with hours and will ---- adjust the day loaders by (time x 24). Environments can differ, so we will want to generally use the lowest ---- measurement of time relative to the amount of loaders that unit of time covers (i.e.: 1 hourly loader and 200 daily loaders, stick with daily) CREATE TABLE tb_Loaders ( LoaderID SMALLINT, LoaderName VARCHAR(100), LoaderTable VARCHAR(100), LastLoad DATETIME, LoaderTimeMax SMALLINT, LoaderAlert VARCHAR(250), DateAdded DATE DEFAULT GETDATE() ) ---- Added at the end of each import procedure, after a successful (inside the TRY) load ---- Note that if the schedule changes, this doesn't need to change; we would only need to change the name of the table per loader UPDATE tb_Loaders SET LastLoad = GETDATE() FROM tb_Loaders -- Alternatively, we could reference the ID WHERE LoaderName = 'BitLoader'
This process updates the LastLoad date after the import process completes, which is a column stored on the tb_Loaders table. It keeps a record of some important information of this import process, such as the name of the import process (LoaderName), the table pertaining to the import process (LoaderTable), the last import date (LastLoad), the import process time maximum (LoaderTimeMax), the loader alert (who gets notified - LoaderAlert), and the date that the import process was created (DateAdded). With this table, I measure the LoaderTimeMax in days, meaning that a value of 4 would be 4 days; in general, I've experienced fewer hourly and minute import processes (generally 1% are exceptions) and the most frequent import processes that change, related to schedule, are monthly and quarterly (they tend to trade places).
With this configuration table, I immediately know a few facts: (1) the names and tables involved in the import processes, (2) their max schedules, (3) who's in charge of them (useful in a large environment), (4) and how old the import process is. If we bring someone new to the environment, they will also know these facts by looking at the table. In addition, it allows for easy change to those specific items. If the schedule changes from every three days to every seven days, I can update the table without changing the code involved in the import process, and the same applies to changing the notification email. Using this configuration table, we can send an email if the maximum time has been exceeded. In the below code, I add the import processes that have exceeded their maximum time to a temporary table, which holds the related information to send out an alert email, then iterate of the rows that exist (if any), and send out an email:
SELECT ROW_NUMBER() OVER (ORDER BY LoaderID DESC) ID , 'EXEC msdb.dbo.sp_send_dbmail @recipients = ''' + LoaderAlert + ''' , @from_address = ''' + LoaderName + '@DoNotReply.com'' , @subject = ''' + LoaderName + ' Alert'' , @body = ''' + + LoaderName + ' has exceeded its max import date by ' + CAST(LoaderTimeMax AS VARCHAR(4)) + ' days.'' , @importance = ''High''' AS MaxDetected INTO ##loopif FROM tb_Loaders WHERE LoaderTimeMax < DATEDIFF(DD,LastLoad,GETDATE()) DECLARE @id INT, @cmd VARCHAR(MAX) WHILE EXISTS (SELECT 1 FROM ##loopif) BEGIN SELECT @id = ID , @cmd = MaxDetected FROM ##loopif EXECUTE(@cmd) DELETE FROM ##loopif WHERE ID = @id END DROP TABLE ##loopif
Note that since our process updates the configuration table's LastLoad on each load, this process can and should be done on our ETL server (not production). As few notes here for developers, an alternative to the update at the end of each import would be looping through the loader table, obtaining the last date of each import process, by retrieving the last load date directly from the table, and checking that against the LoaderTimeMax; I don't use this method because I generally avoid doing checks like that on production servers, and I don't want the table on the ETL server to hold data that's not necessary.
Also, consider that when we discuss maximum time between imports, this covers the time at which we'd want to be warned, not necessarily the time that the import should occur and the nuance matters because a data vendor might tell us that their data are available for importing on the 15th of every month, except certain times, which would be the cause of a delay. I'd suggest using a timeframe that would clearly be a problem; for instance, the maximum timeframe after the 15th day of the month in the United States, assuming that the 15th was on a Friday and on Monday was a holiday, would be five days in addition to the expected time between.
Finally, I suggest keeping track of how many times the schedule changes, which you'll see below this, because it can be cross-referenced with other changes a data vendor or provider may make, and can work as additional evidence in favor or against a POC (proof of concept) involving the vendor.
---- Table that tracks how many times we've updated the schedule CREATE TABLE tb_Loaders ( LoaderID SMALLINT, LoaderName VARCHAR(100), LoaderTable VARCHAR(100), LastLoad DATETIME, LoaderTimeMax SMALLINT, LoaderScheduleUpdates SMALLINT, LoaderAlert VARCHAR(250), DateAdded DATE DEFAULT GETDATE() ) ---- Example script to update an import processes schedule for 35 days for LoaderID 1 UPDATE tb_Loaders SET LoaderTimeMax = 35 , LoaderScheduleUpdates = LoaderScheduleUpdates + 1 WHERE LoaderID = 1
- After taking inventory of your import processes, at what point would a configuration table make notifications easier?
- In your environment and industry, what timing imports tend to change?
About the author
View all my tips