SSIS Transformation Tasks Overview
SQL Server Integration Services is a powerful tool to create ETL solutions. When developing a solution to import data from other sources there are often similar tasks that need to be performed such as: removing duplicates, inserting the same data into multiple destinations and only inserting new records. In this tip we will cover these different scenarios to give you an idea on how you can use some of the SSIS features.
We will cover these three tasks in our SSIS package:
- Import the source data into multiple destinations
- Remove duplicate records
- Only insert new records
So let's get started.
Import Source Data into Multiple Destinations with SSIS
First, we want to insert our source data into multiple destinations. For example, when you have a CSV file with 40 columns and your database structure has 3 tables that hold that type of data. In this case, we can use the Multicast Task.
This task is very simple, it is put right after the data source and connected to multiple destinations. We can then specify the different destination tables and map the columns you need for each table.
Remove Duplicates from Source Data in SSIS
Next, we want to remove duplicates from the source. We will cover two ways to do this.
- One way is to sort the data using the Sort Task. To do this, you select the column or columns you want to sort and then check the option "Remove rows with duplicate sort values" as shown below.
- Another option is to group the data using the Aggregate Transformation Task. In this case, the columns in the aggregation will be available for other tasks.
Below you can see both how to remove duplicates using the Sort Task:
Below you can see both how to remove duplicates using the Aggregate Transformation Task:
The difference between these is that the Aggregate Transformation Task has to include all fields that should be used to group the data and these fields will then be available for tasks that come after this task. The Sort Task sorts the data based on the selected columns and based on these values determines which records are duplicates.
Insert Only New Records with SSIS
The next task I will demonstrate is the Lookup Task, which is used to join columns to the data flow by looking up columns in a table. You then have the ability to send the rows that don't match as one output and rows that do match as another output.
Below the entire data flow is shown.
In the Lookup Transformation Task, we configure what happens with rows that don't have matching entries. Also, we configure the connection specifying a data source to use for the lookup.
For the columns page, we specify how columns will be matched for the Lookup.
Running the SSIS Package
Here we are running the package for the first time and inserting 3 records. If we run it again these records won't be inserted again since they already exist in the table from the first run, based on how we setup this SSIS package.
SQL Server Integration Services provide tasks to transform and validate data during the load process and transformations to insert data into your destination. Rather than create a stored procedure with T-SQL to validate or change data, is good to know about the different SSIS tasks and how they can be used.
- Read more about SSIS Tips here.
About the author
View all my tips