SQL Server Integration Services SSIS Data Flow Task Example
By: Koen Verbeeck
In this section, we will introduce the Integration Services Data Flow. It’s one of the more important features of SSIS and one of the reasons SSIS is considered one of the fastest ETL tools. We’ll give also an overview of the more important transformations you can do in the data flow.
The SSIS Data Flow
The data flow is a special task of the control flow. It needs a canvas of its own, so there’s an extra tab for the data flow, right next to the control flow.
The data flow is a construct where you can read data from various sources into the memory of the machine that is executing the SSIS package. While the data is in memory, you can perform different kinds of transformations. Because it’s in memory, these are very fast. After the transformations, the data is written to one or more destinations (a flat file, an Excel file, a database, etc.). In most cases, not all data is read into the memory at once - although this is possible if you use certain kind of transformations – but the data is read into buffers. Once a buffer is filled by the source component, it is passed on to the next transformation which does it logic on the buffer. Then the buffer is passed to the following transformation and so on until it is written to the destination. You can imagine the data flow is like a pipeline, with data flowing through.
To create a data flow, you can drag it from the SSIS toolbox to the control flow canvas.
Another option is to simply go to the data flow tab, where you will be greeted with the following message:
Clicking the link will create a new data flow task for you. You end up with an empty canvas, just like in the Control Flow.
As you can see in the screenshot above, the SSIS toolbox will change once you go to the data flow canvas. All the tasks are now replaced with transformations, sources and destinations for the data flow. At the top, you also have a dropdown box that lets you easily switch between multiple data flows if you have any.
SSIS Data Flow Sources and Destinations
These are some of the most common sources and destinations for the data flow:
- ADO.NET: With this component you can connect to various sources and destinations using .NET providers.
- Flat File
- ODBC: If you have an ODBC connection defined on your machine, you can use it to read or write data. Keep in mind Visual Studio is a 32-bit application, so you might want to have both 32-bit and 64-bit versions of the ODBC connection installed. You can give them the same name to make the transition smoother.
- OLE DB: With this source you can connect to any database for which an OLE DB provider is available. To date, the OLE DB source and destination are the fastest option for SQL Server.
There are other types of sources and destinations available. You can take a look at the documentation to learn more. You also have the possibility to use a .NET script component to make your own source or destination. This is similar to a .NET script task; you can use C# or VB, but now there are special methods and classes included to handle the buffers of the data flow.
SSIS Data Flow Transformations
Some of the more common transformations:
- Aggregate: Like a GROUP BY in T-SQL.
- Conditional Split: Splits out streams based on one or more conditions.
- Data Conversion: Allows you to convert columns from one data type to another.
- Derived Column: Allows you to manipulate existing columns or create new columns using expressions.
- Lookup: Similar to VLOOKUP in Excel. You match a row against a reference data set and retrieve one or more columns.
- Merge: Merges two streams together. It needs sorted inputs.
- Merge Join: Like a JOIN in T-SQL, it can do inner, left join and full outer joins. It needs sorted inputs as well.
- Multicast: Duplicates a stream into multiple streams.
- Script Component: With the component you can write your own transformations in .NET. When you open the editor, you have to choose if the component is a source, destination or a transformation.
- Sort: This transformation allows you to sort data and remove duplicates.
- Union All: Merges two streams, but doesn’t need sorted input.
In the next two sections of the tutorial, we’ll configure a source, some transformations and a destination.
- There are more transformations available, be sure to check out the documentation.