![]() |
|
|
By: Ray Barley | Read Comments (25) | Print Ray is a Principal Architect at RDA Corporation and a MSSQLTips.com BI Expert. Related Tips: 1 | 2 | More |
|
Problem
We have quite a few SSIS packages that we run on a regular basis to extract data from our OLTP systems and transform/load into our data warehouse. We would like to capture some basic information about these SSIS packages such as what packages were run, when, how long did they take to run, and how many rows were extracted and loaded. We have looked in to the built-in logging that comes with SSIS but we just don't see a straight forward way to get row counts. Are we missing something?
Solution
SSIS provides a flexible logging mechanism that is tightly integrated with the various events raised during package execution. However, the issue that you raise is accurate and the best approach is to use the built-in SSIS logging where appropriate and augment it with your own custom logging. A simple, straight-forward custom logging schema can provide the information you need. Let's take a look at a sample schema that can be used to capture the basic logging information that you have requested:
|
Sample SSIS Logging Data Model |
|
|
Main points about the above schema:
The pattern for populating the logging tables is to implement an "Init" and an "End" stored procedure for each table. The stored procedures are executed in the SSIS package using the Execute SQL task. The PackageLog stored procedures are as follows:
Packages that extract data from a table in an OLTP system will update the ExtractLog table as follows:
Packages that update dimension or fact tables in the data warehouse will update the LoadLog table using the same pattern; i.e. stp_InitLoadLog and stp_EndLoadLog. The LoadLog table records the number of rows that were inserted, updated or deleted for a given dimension or fact table in the warehouse.
Now let's take a look at a simple SSIS package that implements the custom logging. We'll use the AdventureWorks sample database as our OLTP source, extract rows from the Sales.Customer table, and save the rows in a staging table. The following is the Control Flow for the package:
|
SSIS Control Flow |
|
|
Next let's drill in to the Data Flow:
|
SSIS Data Flow |
|
|
The Get Updated Customers task selects the rows from the Sales.Customer table where the ModifiedDate is greater than the maximum ModifiedDate the last time the package was run. The maximum ModifiedDate is stored in the LastExtractDateTime column in the ExtractLog table and returned in the call to the stp_InitExtractLog stored procedure (Init Extract Log task in the Control Flow). The Get Extract Row Count task simply assigns the number of rows passing through to a package variable; the row count is passed as a parameter to the stp_EndExtractLog stored procedure (End Extract Log task in Control Flow) and stored in the ExtractLog table.
Next Steps
| Share: | Share | Tweet |
|
![]() |
![]() |
Free SQL Server Learning |
| Tuesday, July 29, 2008 - 4:08:37 AM - Patricia Martinez | Read The Tip |
|
Thanks a LOT for your tip!!! It has been very helpful to customize my log files, and it has given me ideas to update the log files in a structured way.
Patricia
|
|
| Wednesday, May 11, 2011 - 8:11:21 AM - Jason Yousef | Read The Tip |
|
Great article, thanks for sharing the knowledge, we use the same, but we even log the updated, inserted, and unchanged counts, along with the extracted counts of course. |
|
| Friday, July 01, 2011 - 9:46:32 AM - Johanna | Read The Tip |
|
This works fine, thanks for sharing! Does someone have an example of a stored procedure that will log the updated, inserted and deleted records in dimension and fact tables? I use slowly changing dimensions in SSIS 2008. /Johanna |
|
| Friday, July 01, 2011 - 10:16:39 AM - Ray Barley | Read The Tip |
|
Take a look at the OUTPUT clause: http://msdn.microsoft.com/en-us/library/ms177564.aspx This allows you to "output" to another table when you're doing an insert, update or delete. The above page shows a bunch of examples that output into a table variable but you can use a regular table as well. This is essentially what you could do in a trigger; this just allows you to do it without having the trigger.
|
|
| Friday, July 01, 2011 - 4:13:50 PM - Jeremy Kadlec | Read The Tip |
|
Johanna, Here is a tip on the OUTPUT clause - http://www.mssqltips.com/tip.asp?tip=1381. Thank you, |
|
| Monday, July 04, 2011 - 2:39:16 AM - Johanna | Read The Tip |
|
Thanks a lot! The OUTPUT clause was very useful. /Johanna |
|
| Thursday, April 12, 2012 - 5:08:12 PM - KRK | Read The Tip |
|
Great article. Which version of the AdventureWorks Database do I need and where can I get the MSSQLTIPS DB? Thanks. |
|
| Thursday, April 12, 2012 - 5:37:31 PM - Ray Barley | Read The Tip |
|
The download (link is in the Next Steps section of the tip) has a text file with the T-SQL statements to create the database objects. You can create a database called MSSQLTIPS and run the T-SQL commands in it. As far as AdventureWorks this tip is pretty old so it was probably the version that came with SQL Server 2005. I don't think any of the logging is dependent on having adventureworks; in other words you test the logging and not even have the Extract Customers data flow. There is a package variable that gets set with the number of rows extracted but that's all that happens in the data flow as far as what affects the logging.
|
|
| Thursday, April 12, 2012 - 7:00:59 PM - krk | Read The Tip |
|
Thanks for the reply. Before and after each load I must got a count from the source system and staging. If they do not match I need to record the mismatch. I did something like this in DTS in 2004 but I can't remeber exactly how I did it. I seem to remeber having an Execute SQL Task to output the counts as output parameters and save in a global variable and log the error in the next task. I want to keep the table structures as generic as possible but I still pondering the design. Thank again.
|
|
| Friday, April 13, 2012 - 8:32:27 AM - KRK | Read The Tip |
|
I ran your create_db_objects.txt script and it is missing some tables. Please advise. What file format is the CustomLogging.database? I was hoping to get the correct version of AdventureWorks and MSSQLTIPS Database. Thank you. Msg 4902, Level 16, State 1, Line 1 Cannot find the object "etl.ExtractLog" because it does not exist or you do not have permissions.
Msg 4902, Level 16, State 1, Line 1
Cannot find the object "etl.ExtractLog" because it does not exist or you do not have permissions.
Msg 4902, Level 16, State 1, Line 2
Cannot find the object "etl.LoadLog" because it does not exist or you do not have permissions.
Msg 4902, Level 16, State 1, Line 1
Cannot find the object "etl.LoadLog" because it does not exist or you do not have permissions.
|
|
| Friday, April 13, 2012 - 10:22:00 AM - KRK | Read The Tip |
|
I created the DDL for the tables based on your ERD. CREATE SCHEMA etl GO CREATE TABLE etl.PackageLog(
CREATE TABLE etl.LoadLog ( Unfortunately I do not understand how to configure a RowCount Task.
|
|
| Friday, April 13, 2012 - 10:37:34 AM - Ray Barley | Read The Tip |
|
You define a package variable (typically at the package level) then add a row count transformation in your data flow after a data source; i.e. you read data from somewhere then each row flows through the row count. You give the row count your package variable and it populates it with the number of rows that you retrieved. Here are the details: http://msdn.microsoft.com/en-us/library/ms141136.aspx
|
|
| Friday, April 13, 2012 - 12:05:46 PM - KRK | Read The Tip |
|
Thanks for the Link on Row Count Transformation. Actually I found this article earlier this morning. It did not go into thjat much detail, for example, how does the Input and Output Properties get defined?
I created all of the DB objects but the EXEC SQL Task that calls EXEC etl.stp_InitExtractLog ?,'Customer' fails. The stored procedure has two paramters, one for PackageID and that variable has no value. I do not see what the packageID Variable is being set? So the Stored procedure fails because it can't insert a null value in the ExtractCount Column. The second paramater is 'Customer'. I believe that it should be 'Sales.Customer'. Thanks for replying.
|
|
| Friday, April 13, 2012 - 12:37:06 PM - KRK | Read The Tip |
|
ok, my bad the task that calls EXEC etl.stp_InitExtractLog completed successfully. However there seems to be an issue with the OLE DB Source. The parameter does not appear to be mapped. When I click on Preview for Get Updated Customers I get an message box 'No value given for 1 or more required paramaters?
|
|
| Friday, April 13, 2012 - 12:44:26 PM - Ray Barley | Read The Tip |
|
The PackageID variable is set in the Init Package Log Execute SQL Task. The stored proc that's called returns the value. Pay close attention to how you configure the task. On the General tab ResultSet is set to SingleRow. On the Parameter Mapping tab a package variable is passed as a parameter to the stored proc. On the Result Set tab the first column of the result set returned by the stored proc is assigned to the package variable User::v_PackageLogID |
|
| Friday, April 13, 2012 - 2:04:24 PM - KRK | Read The Tip |
|
Ray,
I checked everything and is configured as you specified.
WHen I execute the Stored Procedure in SSMS it returns the package ID,
On the results table the result name is 0 and the variable name is User::v_PackageLogID.
It appears that the variable User::v_PackageLogID Variable is not being set?
|
|
| Friday, April 13, 2012 - 2:34:02 PM - KRK | Read The Tip |
|
The OLE DB SOurce and Destination task faile and I'm getting the following errors: [SSIS.Pipeline] Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. [Put Updated Customers in Staging [56]] Error: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "input "OLE DB Destination Input" (69)" failed because error code 0xC020907B occurred, and the error row disposition on "input "OLE DB Destination Input" (69)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure. [SSIS.Pipeline] Error: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component "Put Updated Customers in Staging" (56) failed with error code 0xC0209029 while processing input "OLE DB Destination Input" (69). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure. [Get Updated Customers [1]] Error: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020. [SSIS.Pipeline] Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "Get Updated Customers" (1) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure. |
|
| Friday, April 13, 2012 - 3:44:00 PM - KRK | Read The Tip |
|
Ray, I created the destination table with a SELECT INTO WHERE 1 = 2. There were NOT NULL Constraints, I removed them and it worked. Thanks for being patient with me? Did you write an article on Package Error Loging?
|
|
| Friday, April 13, 2012 - 4:55:30 PM - Ray Barley | Read The Tip |
|
I did not but if you search the MSSQLTips site for SSIS Error Log" you will get a number of tips. Click the Tips tab to get to the tips. |
|
| Saturday, April 14, 2012 - 8:25:31 AM - krk | Read The Tip |
|
Ray, I have 69 tables in each of two schemas (138 Tables). The tables are the same in each schema. Do you know how I could modifiy the Package to include the Schema Name? Thanks again.
|
|
| Saturday, April 14, 2012 - 9:28:51 AM - Raymond Barley | Read The Tip |
|
Add a SchemaName column to the ExtractLog and LoadLog tables. Add a SchemaName parameter to the stp_InitExtractLog and stp_InitLoadLog stored procedures; add the SchemaName column to the INSERT statements. |
|
| Saturday, April 14, 2012 - 11:43:44 AM - krk | Read The Tip |
|
I just need to add a paramater and modify the stored procedure & table to include the schema column. I can't think of an alternative way other than hard code the table & schema. I appreciate your replies. Thanks.
|
|
| Sunday, April 15, 2012 - 7:51:44 AM - Ray Barley | Read The Tip |
|
Hard-coding the schema name and table name is usually okay unless they are going to change. If they will change then you may be able to put them in a table that you read or possibly supply them a parameters when you run the package. Either way if the schema and table name are variable, then your SQL statements will need to by dynamic. |
|
| Monday, May 07, 2012 - 10:48:52 AM - Jackie | Read The Tip |
|
I just read this article today and I was able to get everything to work except the LoadLog - where is the store procedure logic for stp_InitLoadLog and stp_EndLoadLog - it was not in the zip file above? Can I use LoadLog for my load into staging and into dimensions and fact tables? If so can you show me how? |
|
| Monday, May 07, 2012 - 1:52:25 PM - Raymond Barley | Read The Tip |
|
The package only dealt with extracting data into staging. To do custom logging for loading dimension or fact tables, stp_InitLoadLog and stp_EndLoadLog would look almost identical to stp_InitExtractLog and stp_EndExtractLog; the execute SQL tasks would also be just about the same as they are in the package in this tip. |
|
|
privacy | disclaimer | copyright | advertise | about authors | contribute | feedback | giveaways | user groups Some names and products listed are the registered trademarks of their respective owners. Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com |