What is an SSIS Package
This tip will teach you what a Microsoft SQL Server Integration Services package is and what you can do with it for data integration projects when working with SQL Server, Oracle, Excel, CSV files or flat files, XML, stored procedures, etc. data sources.
A general introduction to the SQL Server Integration Services (SSIS) service is given in the tip What is Integration Services. There we explained SSIS is the ETL tool of the SQL Server stack, but in essence it's a workflow orchestrator often used for data warehousing or data integration projects. The unit of work in an SSIS project is a "package". The following screenshot shows a bunch of packages inside an SSIS project:
Why use an SSIS Package?
A single package in an Integration Services Project is supposed to do one single task, such as: load data into a fact table in the data warehouse, transfer data from one server to another or take a backup of database. Some data-related tasks that you want executed. SSIS packages can easily be scheduled to run frequently, it's also possible to create one-shot packages that execute a specific task just once. For example, import some sample data into a database.
Because SSIS packages are visual, it's often easier to quickly create an SSIS package instead of writing code that accomplishes the same task.
Even though you can create packages that do everything, it's a development best practice to create packages that have one single task. If a package tries to do too many things, it becomes harder to maintain, more difficult to understand and if it fails mid-execution, harder to restart without doing everything from scratch again.
Creating an SSIS Package
To create an SSIS package, you need Visual Studio. In the earlier versions of SSIS, you could download a free shell of Visual Studio if you didn't have a Visual Studio license. If you did have the full version of Visual Studio, you needed to install the business intelligence project templates for developing SSIS, SSAS and SSRS solutions. An historical overview of all the name changes:
- SQL Server 2005 – 2008R2: Business Intelligence Development Studio (BIDS), either as a shell or as templates for Visual Studio.
- SQL Server 2010: SQL Server Data Tools for Business Intelligence (SSDT-BI), not to be confused with SSDT for database projects. Either a shell or the templates.
- SQL Server 2012 – 2016: SQL Server Data Tools, either as a shell or as templates.
- SQL Server 2017: SQL Server Data Tools. Database projects are included in the installation of Visual Studio. For business intelligence projects, you can either install separate Visual Studio extensions or the stand-alone SSDT installer which installs a shell of Visual Studio.
- SQL Server 2019: SQL Server Data Tools. Again, database projects are included in the installation of Visual Studio. SSIS, SSAS and SSRS are only available as extensions. This means you need a full version of Visual Studio, there's no shell available. If you only use Visual Studio for the business intelligence projects, you can use the free Community Edition.
You can find more info in SQL Server Integration Services SSIS Versions and Tools. For SSIS 2019 (the latest version at the time of writing), you can installation info in the tip Install SQL Server Integration Services in Visual Studio 2019. Once you have the correct tooling for your version of SQL Server, you can create a new project, as explained in Create a SQL Server Integration Services SSIS Project. A new package will be automatically been added to the project:
The Different Parts of an SSIS Package
The control flow is the most important part of an SSIS package. It's used to configure all the tasks. The control flow is a visual canvas where tasks are connected with each other through precedence constraints. If tasks are not connected to each other, they run in parallel when the package is executed.
When the control flow tab is selected, the SSIS toolbox will contain all the different tasks and containers that can be used:
More information about the control flow can be found in the following tips:
- SQL Server Integration Services (SSIS) Control Flow
- SQL Server Integration Services SSIS Control Flow Example
- Defining Workflow in SSIS using Precedence Constraints
- Using Expressions On Precedence Constraints in Integration Services
- You can find an overview of control flow related tips here.
The data flow is a special task in the control flow. This task can load data from a data source in memory, transform the data using various available transformations and then write it to a destination. Since this needs additional configuration, a separate tab is available in the data flow. It's possible to have multiple data flows in one control flow. An SSIS package has only one control flow.
When you go to the data flow tab, the SSIS toolbox will show available sources, transformations and destinations. An example:
Keep in mind all data read from the source is read into memory. Some transformations – like the Aggregate and Sort transformation – need all data to be read into memory before they can start outputting rows. This might have a serious performance impact.
You can find more info about the data flow in the following tips:
- SQL Server Integration Services (SSIS) Data Flow
- SQL Server Integration Services SSIS Data Flow Task Example
- Improve SSIS data flow buffer performance
- SSIS Data Flow Parallel Processing
Parameters and Variables
If you want to make your package more flexible, you can use variables and parameters. The big difference between the two is that parameters are used as input when the package starts executing. They cannot change value during execution, while variables can change value.
Package parameters have their own tab:
Project parameters (which can be reused between different packages) are located in the Solution Explorer window:
Variables have their own window:
If you cannot find the variables window or tab, you can go to Extensions (in Visual Studio 2019) > SSIS > Variables.
The value of a variable can change due to an expression that has been defined, or because of the result of a task, such as an Execute SQL Task or a ForEach Loop Container.
More information about variables and parameters:
- SSIS Variable and Expression Example - Part 1
- Getting Started with SQL Server Integration Services SSIS Parameters vs. Variables
- Dynamic SQL Server Integration Services Workflow Based on Variables
- SSIS Expression Examples for Dates, String Concatenation, Dynamic File Names and More
Event handlers are an option in SSIS to execute additional tasks when a certain type of event occurs. Often the OnError event handler is used to take some actions in case the package fails, such as logging the error into a table or sending an email.
The tip Capturing SQL Server Integration Services Package Errors Using OnError Event Handlers has more info about event handlers.
The package explorer tab has a tree view of all the objects inside the SSIS package:
Debugging an SSIS Package
You can start debugging an SSIS package by hitting F5 or clicking the green arrow in the toolbar.
SSIS will first validate the package and then it will start executing. During execution, a progress tab will be added:
Once the execution is finished, the tab will be renamed to "Execution Results". While the package is executing, you can put breakpoints on different tasks. When a breakpoint is hit, the execution will pause which will allow you to inspect the state of the package and for example the value of different variables.
It's also possible to set breakpoints inside script tasks and script components. It's not possible to set a breakpoint on a transformation in the data flow, but you can add data viewers. When data flows through a path where a data viewer is set up, a window will pop-up which will allow you to inspect the data that is flowing through.
More about debugging:
- Debugging Control Flow in SQL Server Integration Services - Part 1 and Part 2.
- An example where a data viewer is used: Data Sampling in SQL Server Integration Services
- Retrieve the column causing an error in SQL Server Integration Services
When an SSIS package runs, you typically want to log some information about package execution. When did the package start? How long did it execute? Were there any errors or warnings? When you use the SSIS project deployment model, all information is automatically logged in the SSIS catalog (see the tip What is Integration Services for more info). But sometimes you need to log information yourself. You can either use an Execute SQL Task to write some data to a table, or a data flow to write data to a file, but you can also use the integrated logging of the package itself.
If you use the package deployment model, this is the only logging option there's available.
To get started, check out these tips:
- Integrated Logging with the Integration Services Package Log Providers
- Custom Logging in SQL Server Integration Services
- Integration Services Logging Levels in SQL Server 2016
- Logging Level Recommendations for the SQL Server Integration Services Catalog
- A couple of resources on this website to get started with SSIS.
- An overview of SSIS development tips can be found here.
- For the cloud based version of SSIS, learn about Azure Data Factory and Synapse Analytics.
Last Updated: 2021-04-05
About the author
View all my tips