Error Handling in SSIS Data Flow Task - Part 1


By:   |   Updated: 2020-06-19   |   Comments (4)   |   Related: More > Integration Services Development

Problem

As a SQL Server Business Intelligence Developer I would like to implement and investigate error handling mechanism by manually debugging an SQL Server Integration Services (SSIS) package error to find the cause before deciding to keep or discard the error rows.

Solution

We can implement an error handling strategy by redirecting the error output to store the inconsistent data for further investigation.

We can then debug an SSIS Package manually by working on a copy of its Data Flow Task while keeping the original Data Flow Task intact at the same time.

The difference between the original and copied version is that one handles or redirects the error while other throws it.

Problem Breakdown

Let us simplify the problem scenario by breaking it into the following parts:

  1. The data needs to be transferred from source to destination
  2. The data flow must not terminate even if an inconsistent record is received
  3. The inconsistent data must be stored for later investigation
  4. The inconsistent data must be justified before it is discarded by finding the reason of its being rejected

Solution Breakdown

We are proposing the solution of above problems as follows:

  1. An SSIS Package can be used to copy data from source to destination
  2. Redirecting error output can help us to keep the data transfer running even if an inconsistent or invalid record is received
  3. Storing the error output to another storage such as table or file can help us to investigate it later
  4. Replicating the error by removing the error output helps us to understand the reason behind the error

The first part of this tip is focused on creating an SSIS Package to copy the data from source to destination along with creating an error handling mechanism to store the error and to ensure that the process does not terminate even if an invalid record is received.

Data Transfer (Copy) through SSIS Package with Error Handling

About the Problem Scenario

Please consider a data transfer (copy) scenario where an SSIS Package has been designed to copy the data from source to destination.

Let us discuss the SSIS Package and the related things required to be setup for the problem(s) to exist and the solution(s) to work.

The Role of Data Flow Task

The data between source and destination is copied by the help of a Data Flow Task in the SSIS Package.

Error Handling Mechanism

The Data Flow Task offers an error handling mechanism by redirecting errors to a predetermined output so that the data transfer process continues to work even after encountering an error for it to be consistently doing its job.

The philosophy behind error handling is to catch the invalid rows to be investigated later on while making sure that general data transfer does not fail in the middle of the process because of one or more invalid rows.

This does not make sense when we have a handful of records, but in real world scenarios millions of rows are getting transferred (copied) from source to destination and then the importance of error handling mechanism becomes evident.

Setup Source Database (WatchesV5)

Please create a sample database called WatchesV5 to represent source by running the following T-SQL script:

-- (1) Create WatchesV5 database 
CREATE DATABASE WatchesV5;
GO
 
-- (2) Connect to WatchesV5 database
USE WatchesV5
 
-- (3) Creating Watch table
CREATE TABLE [dbo].[Watch]
(
   [WatchId] INT NOT NULL IDENTITY(1,1), 
   [WatchType] VARCHAR(40),
   [Brand] VARCHAR(40),
   [Colour] VARCHAR(20),
   [Price] DECIMAL(10,2)
    CONSTRAINT [PK_WatchType] PRIMARY KEY (WatchId),
);
 
 
-- (4) Populating Watch table
SET IDENTITY_INSERT dbo.Watch ON
INSERT INTO [dbo].[Watch] ([WatchId], [WatchType], [Brand], [Colour], [Price]) VALUES (1, N'Digital', N'Timex Ironman ', N'Black', CAST(150.00 AS Decimal(10, 2)))
INSERT INTO [dbo].[Watch] ([WatchId], [WatchType], [Brand], [Colour], [Price]) VALUES (2, N'Analogue', N'Casio G-Shock', N'Blue', CAST(200.00 AS Decimal(10, 2)))
INSERT INTO [dbo].[Watch] ([WatchId], [WatchType], [Brand], [Colour], [Price]) VALUES (3, N'Smart', N'Garmin Smartwatch', N'Yellow', CAST(300.00 AS Decimal(10, 2)))
INSERT INTO [dbo].[Watch] ([WatchId], [WatchType], [Brand], [Colour], [Price]) VALUES (4, N'Sports', N'Timex Expedition', N'Black', CAST(150.00 AS Decimal(10, 2)))
INSERT INTO [dbo].[Watch] ([WatchId], [WatchType], [Brand], [Colour], [Price]) VALUES (5, N'Digital', N'Seiko Superrunner', N'Brown', CAST(185.00 AS Decimal(10, 2)))
SET IDENTITY_INSERT dbo.Watch OFF

Quick View Source Database

Check the newly created sample source database as follows:

-- View source database
SELECT TOP 3 w.WatchId,w.Brand,w.Colour FROM dbo.Watch w

The output is shown below:

Source Database

Setup Destination Database (WachesV5DW)

We are now setting up the destination database running the following T-SQL script against master database:

-- (1) Create WatchesV5DW database 
Create DATABASE WatchesV5DW;
GO
 
-- (2) Connect to WatchesV5DW database 
USE WatchesV5DW;
GO
 
-- (3) Creating Watch table in Staging environment
CREATE TABLE [dbo].[Watch] (
    [WatchId]   INT             NOT NULL,
    [WatchType] VARCHAR (40)    NULL,
    [Brand]     VARCHAR (40)    NULL,
    [Colour]    VARCHAR (20)    NULL,
    [Price]     DECIMAL (10, 2) NULL, 
    CONSTRAINT [PK_Watch] PRIMARY KEY ([WatchId])
);

The destination database is empty because we are going to populate the database by copying the data through SSIS Package.

Setup SSIS Package for Data Transfer Task

We have to create an Integration Services Project in Visual Studio to setup a data copy task from source to destination.

Create Integration Services Project and Package

Create a new Integration Services Project called copy data with error handler under a new solution Debug SSIS Package Solution in Visual Studio.

Name the SSIS Package as copy-watches-data as follows:

Integration Services Project

Create Source and Destination Connections

Right click anywhere inside the Connection Managers area and Click New OLE DB Connection… as shown below:

New OLE DB Connectionů

Create a new connection manager pointing to the source database after inputting the provider, server and database information and clicking OK as shown below:

New connection manager pointing to the source database

Rename the connection as WatchesV5-Source.

Create a second connection for the destination database in the same way you created the connection for the source database, but please remember to point it to Watches5DW this time:

Configuring destination database connection

Rename the connection as WatchesV5DW-Destination.

Create Data Flow Task to copy data

Next please drag Data Flow Task component from SSIS Toolbox and drop it onto the Control Flow surface and rename it DFT Copy Watches Data:

Data Flow Task to copy data

Double click Data Flow Task and add an OLE DB Source component pointing to WatchesV5-Source connection and selecting dbo.Watch table as follows:

OLE DB Source component pointing to WatchesV5-Source

Create OLE DB Destination component by bringing it forward from the SSIS Toolbox followed by joining OLE DB Source with it:

OLE DB Destination Component to be configured

Configure OLE DB Destination and point it to the destination database WatchesV5DW:

Configure OLE DB Destination

Please make sure the following things are done:

  • Keep Identity option has been checked
  • Mappings are correct (which should be fine if you have followed all the steps)

Please refer to Create Custom SSIS Execution Log - Part 1 tip to get see how column mappings are checked.

Rename OLE DB Source as SRC Watch followed by renaming OLE DB Destination as DST Watch.

Create Error Handler for the SSIS Package

It is recommended to have some error handling mechanism in place to ensure smooth transferring of data even if an invalid row occurs.

Redirecting Errors from OLE DB Destination

Double click OLE DB Destination component (DST Staging Watches) and go to Error Output from the left navigation bar.

Select Redirect row from Error dropdown box and then select Redirect row at the bottom of the window. Click Apply to apply the changes and then Click OK as shown below:

Selecting Redirect row option

Creating Error Table

The errors can be sent to any output including text files, but to keep simple when starting off.  Create an error table called WatchErrors in the destination database as follows:

USE WatchesV5DW
 
-- Creating WatchErrors table
CREATE TABLE [WatchErrors] (
[WatchErrorId] int primary key identity(1,1),
    [WatchId] int,
    [WatchType] varchar(40),
    [Brand] varchar(40),
    [Colour] varchar(20),
    [Price] numeric(10,2),
    [ErrorCode] int,
    [ErrorColumn] int
);
GO

Go to SQL Server Object Explorer in Visual Studio to check the newly created table after refreshing the database:

Table to store errors

Adding Error Handler

Drag another OLE DB Destination component onto the Data Flow Task designer and rename it DST Errors after joining DST Watch error output with it as shown below:

Adding Error Handler

Select the same connection manager we selected for the destination database and select WatchErrors table in the next section and then click OK:

Configuring Error Output Destination

We are all set now since the error handling mechanism is ready to be used when required.

Run SSIS Package to Copy Data from Source to Destination

It is time to run the SSIS Package, so right click on the package and click Execute:

SSIS Package Run to Copy Data from Source to Destination

The package has run successfully so please check the destination database by running the following script:

USE WatchesV5DW
 
-- View destination database after the SSIS Package to copy the data has run
SELECT sw.WatchId,sw.Brand,sw.Colour FROM dbo.Watch sw
 

The output as follows:

View destination database after package run

Error Handling Mechanism Check

Next, we are going to check if the error handling mechanism is working or not.

There are many ways to check if the error handling strategy in place, but we are using a simple approach by re-running the SSIS Package.

Please remember if we re-run the package the same rows are going to be loaded again into the destination database and this is when we are able to see if the error handler is working or not since the target table has primary key constraint in place to reject such data and in the absence of error handler the rejected rows are going to terminate the package as running abnormally.

Re-run SSIS Package to copy the same data

Right click on the SSIS Package and click Execute Package to run it:

Rerun SSIS Package

The output is as follows:

5 rows being sent to error output

The package has run successfully by redirecting the duplicate data into the Error table.

Stored Error Rows Check

Please check the error table where error rows were sent during the SSIS Package run:

-- Connect to destination database
USE WatchesV5DW
 
-- View destination database after the SSIS Package to copy the data has run
SELECT we.WatchId,we.Brand,we.Colour,we.ErrorCode FROM dbo.WatchErrors we

The results are as follows:

Error Rows Successfully Stored

Congratulations, we have implemented following parts of the solution:

  1. SSIS Package has been designed and tested to copy data from source to destination (database)
  2. The error handling mechanism works well when inconsistent data is inserted into the destination (table)
  3. The error rows are also getting stored for later investigation

However, the final and crucial bit of the solution which helps us in investigating the error rows by manually debugging the SSIS Package is yet to be explored in the next part of the tip.

Next Steps
  • Please add another table called WatchOrder to the source and destination databases followed by adding and test running its error handling mechanism by following the examples in this tip.
  • Please try to disable error handler mentioned in the above tip and run the package with consistent data followed by loading inconsistent data to see it terminating abnormally.


Last Updated: 2020-06-19


get scripts

next tip button



About the author
MSSQLTips author Haroon Ashraf

Haroon Ashraf's interests are Database-Centric Architectures and his expertise includes development, testing, implementation and migration along with Database Life Cycle Management (DLM).

View all my tips






Comments For This Article




Thursday, July 02, 2020 - 4:47:06 AM - Haroon Ashraf Back To Top

Hi,

Thank you for your valued comments.

Yes, data remains one of the most valuable assets of business and in any data transfer scenario the primary objective should always be safe and complete transfer of data without losing its integrity.

However, there should be a pretty solid well defined scope of our work as a developer/tester otherwise we may not be able to cope with all the possible scenarios of data loss.

For example if the source itself is corrupt then we should not be taking the responsibility of that because this is beyond the scope of our work and reach but at the same time we should be very carefully handling this situation so that the business stake holders do understand the weaknesess in their existing system and things beyond our reach.

A good developer is the one who does his best by understanding the problem, desiging a good solution and testing it (development testing) multiple times before it is handed over to the testing team but at the same time he knows the limitation and scope of his work and make sure that his team is well aware of it.

In short, ensuring the source remains stable is something exceptional to take on as a responsibility for an business intelligence or ETL developer  because this is somehting to be done on that end by the development team taking care of that end of data such as source databases in Production.


Sunday, June 28, 2020 - 7:51:00 PM - Soumen kapri Back To Top

Hi,

Sometimes at source task also It gets failed due to erroneous data which SSIS is unable to handle/read. Considering that it is better to add redirecting mechanism at source task also similar to the destination task.

Another point, for fully automation you can add a variable to count the rows been redirected and then based on that count add send mail task to alert the team about data discard. In case of data sensitivity you can send an alert only or else can attach the data also.

This will serve two purpose

1. Team gets to know that there is erroneous data coming from source.

2. Team can check with source team and fix the data at source end so that there will be no data loss.

In this data driven IT world data is very important and we should be be careful on such scenarios.


Saturday, June 27, 2020 - 8:15:01 AM - Haroon Ashraf Back To Top

Mr. Mohammad Shahrnevis,

Thank you for your valued feedback.

You have asked an excellent question but please remember you may find answers of many such questions in the next part of the tip to be published soon.

As a general rule of thumb please remember most of the time (in real time scenarios) it can be a single error behind a complete batch (may be hundreds of records) failure and to see it in action please stay in touch as I have plans to write more about it.

I hope this helps and thank you for once again for your comments.


Thursday, June 25, 2020 - 6:20:53 PM - Mohammad Shahrnevis Back To Top

Dear Haroon Ashraf

Thanks alot about your article.

But can you ell us how we can find what kind of error raised for every record that stor in error table? Otherwise how can find definition error from error codes field in error table?

Thank you



download


Recommended Reading

Import UTF-8 Unicode Special Characters with SQL Server Integration Services

Install SQL Server Integration Services in Visual Studio 2019

SSIS Expression Examples for Dates, String Concatenation, Dynamic File Names and More

Backwards Compatibility in SQL Server Data Tools for Integration Services

SSIS Toolbox is not visible in SQL Server Data Tools





get free sql tips
agree to terms


Learn more about SQL Server tools