What is SSIS?
In this tip, we'll explain what the SQL Server Integration Services service is and things you should know to get started using SSIS.
According to the Microsoft documentation, the Integration Services (SSIS) service is the following:
"SQL Server Integration Services is a platform for building enterprise-level data integration and data transformations solutions. Use Integration Services to solve complex business problems by copying or downloading files, loading data warehouses, cleansing and mining data, and managing SQL Server objects and data."
SSIS is commonly known as an "ETL" product, a means to move and transform data from one point to another from various data sources (SQL Server or Oracle databases, Excel, CSV or Text files and more). But SSIS is so much more. In its bare essence, SSIS is a workflow orchestrator: it executes different tasks, either sequentially or in parallel, and different execution paths can be executed, depending on success, failure or other factors including precedence constraints. This means SSIS can be used in other scenarios than just data warehousing with the corresponding SSAS and SSRS Microsoft Data Platform products. Database administrators can use it for example to schedule and execute backups of their databases.
SSIS as a workflow orchestration service
An SSIS project contains one or more SSIS packages. A package can be considered as a single unit of work: it is supposed to do a single task, such as to load data into a dimension in the data warehouse, transfer data from one server to another or take a backup of a database.
In such a package, you can have multiple Control Flow and Data Flow Tasks. For example, when taking backups of databases, you can have first a script that checks the connection with SQL Server. If this succeeds, scripts are run to take the actual backups. If the backups are successfully completed, a script is run to log the info into a table. If the backups fail however, an email is sent to the administrator's team. The following diagram conceptualizes this scenario, where green arrows mean success and red arrows mean failure:
Since there are no arrows between the backup tasks, they are executed in parallel.
There are many out-of-the-box tasks in SSIS, such as executing a SQL Script, sending an email, copying files in the file system, executing an .exe file, and so on as Control Flow tasks. However, if something is not included, you can extend functionality by writing your own scripts using .NET.
You can use SSIS any time you want to automate a set of tasks. But why use SSIS when you can for example write everything in T-SQL stored procedures? Or write a solution using PowerShell or .NET? SSIS has a couple of advantages over those solutions:
- SSIS is mainly a visual tool. Packages are developed in Visual Studio and they use a visual canvas to arrange tasks on the Control Flow tab. This means the solution is more easily understood by someone new to the project. You can just look at the package and see all the dependencies, while with code this is much harder to understand. This is an example of an SSIS package is Visual Studio:
- SSIS can easily transfer data between multiple servers. In a T-SQL solution, this is harder as you most likely need linked servers which are not optimal for performance.
- In SSIS, it's straight forward to run tasks in parallel, greatly optimizing performance. In pure code solutions, this is much harder, if not impossible to do.
- Some tasks are harder to do in T-SQL than in SSIS, such as using regular expressions, connecting with FTP servers, interacting with files in the file system, looping over files, etc.
ETL vs ELT with SQL Server Integration Services
SSIS is usually referred to as "the ETL tool of Microsoft". But what is ETL exactly? It stands for Extract – Transform – Load. With SSIS, you can load data from multiple types of sources (flat files, Excel files, SQL Server and Oracle relational databases, XML, OLE DB data sources and so on), transform data on-the-fly and write the data to multiple types of destinations.
An ETL tool is typically used when you have to integrate data from multiple sources into a destination, such as a data warehouse. For this reason, SSIS is also typically considered an important part of business intelligence projects.
In the past few years, the pattern to load data has moved from ETL to ELT, which stands for Extract – Load – Transform. In this scenario, data is extracted from the source, dumped into the destination (which is typically very fast) and then the destination itself will transform the data, not SSIS. For example, data is extracted from various source systems (CRM database, ERP system …) and dumped into staging tables in SQL Server. Then, using T-SQL scripts, the data is transformed as it is loaded into the data warehouse (which is also built in SQL Server). You can apply the same pattern in Azure by using for example Azure Data Lake and Azure Databricks.
The advantage of ELT is that you take the strengths of the destination into account. The SQL Server database engine is powerful and very well suited to run large analytical SQL statements on large databases. Luckily, SSIS adapts well to an ELT scenario. You can extract and load the data with SSIS and then orchestrate the execution of the T-SQL scripts.
SSIS for the Database Administrator
Not only data warehouse projects benefit from Microsoft SQL Server Integration Services, but developers and administrators can use SSIS to their advantage as well. Any time some tasks related to SQL Server need to be automated, SSIS can be used. It's possible you have used SSIS without even knowing it! For example, the import/export wizard in SQL Server Management Studio (SSMS) uses SSIS behind the scenes.
With this graphical tool, you can import or export data in and out of SQL Server without writing a single line of code. To learn more, check out these tips:
Also, maintenance plans from SSMS are in fact SSIS packages:
The tip Getting Started with SQL Server Maintenance Plans - Part 1 is an excellent starting point to learn more about this feature.
In SSIS itself, there are many tasks provided for administrative purposes:
How do you obtain Integration Services?
SSIS is part of the SQL Server offering. When you install SQL Server, you have the option in the SQL Server set-up to install the SSIS service as well. The following image comes from the tip Installing SQL Server Integration Services.
This has two consequences:
- To install SSIS, you need a SQL Server license. Depending on the tasks you want to do with SSIS, you either need Enterprise or Standard licensing. In most scenarios, Standard should be sufficient. It also means you need a SQL Server license for every machine you install SSIS on. If you install SSIS on the same machine as the database engine, SSIS "comes free" with the SQL Server license.
- Since SSIS is part of the SQL Server stack, it is mainly used for data-related projects, such as data integration, data migration, data warehousing, or the management of SQL Server related objects. If you don't have SQL Server and you are looking to automate tasks – such as archiving files – you might want to look at alternative solutions such as PowerShell, since SQL Server licenses don't come cheap.
Administration of SSIS
SSIS has actually two modes of development: the package deployment model and the project deployment model. The first one exists since the release of SSIS in 2005. The last one was introduced in Microsoft SQL Server 2012. In the package deployment model, packages are either stored on the file system or in the msdb database. There's not much administration involved.
In the project deployment model, packages are stored in a centralized database called SSISDB. On top of this database a management layer is built: the SSIS catalog. The catalog integrates management of the SSIS packages and configurations, as well as built-in logging and reporting of the executions. The SSISDB can be managed like any other user database in SQL Server. The tip Understanding the SQL Server Integration Services Catalog and creating the SSISDB Catalog is a great start to learn more.
For more information about the project deployment model, check out the tips SSIS Project Deployment Model in SQL Server 2012 (Part 1) and (Part 2).
It's also possible to migrate SSIS project to Azure Data Factory. In this case, the SSIS catalog is hosted in either Azure SQL DB or in an Azure SQL Managed Instance. Check out the tip Configure an Azure SQL Server Integration Services Integration Runtime for more info.
- There are many resources on MSSQLTips.com to get started with SSIS.
- Tips about SSISDB and the Catalog:
- There's a whole section of tips about SQL Server maintenance plans on this page.
About the author
View all my tips
Article Last Updated: 2021-03-16