What is SSIS?


By:   |   Updated: 2021-03-16   |   Comments   |   Related: More > Integration Services Development


A Powerful and Secure Alternative to SSIS

Free MSSQLTips Webinar: A Powerful and Secure Alternative to SSIS

SSIS is the de facto solution for SQL Server data integration and transformation. However, standard ETL processes are typically slow and result in data that is not always current which impedes decision making and business processes. Organizations are increasingly in need of better performing real-time data access without sacrificing critical access controls and data activity monitoring.


Problem

In this tip, we'll explain what the SQL Server Integration Services service is and things you should know to get started using SSIS.

Solution

According to the Microsoft documentation, the Integration Services (SSIS) service is the following:

"SQL Server Integration Services is a platform for building enterprise-level data integration and data transformations solutions. Use Integration Services to solve complex business problems by copying or downloading files, loading data warehouses, cleansing and mining data, and managing SQL Server objects and data."

SSIS is commonly known as an "ETL" product, a means to move and transform data from one point to another from various data sources (SQL Server or Oracle databases, Excel, CSV or Text files and more). But SSIS is so much more. In its bare essence, SSIS is a workflow orchestrator: it executes different tasks, either sequentially or in parallel, and different execution paths can be executed, depending on success, failure or other factors including precedence constraints. This means SSIS can be used in other scenarios than just data warehousing with the corresponding SSAS and SSRS Microsoft Data Platform products. Database administrators can use it for example to schedule and execute backups of their databases.

SSIS as a workflow orchestration service

An SSIS project contains one or more SSIS packages. A package can be considered as a single unit of work: it is supposed to do a single task, such as to load data into a dimension in the data warehouse, transfer data from one server to another or take a backup of a database.

In such a package, you can have multiple Control Flow and Data Flow Tasks. For example, when taking backups of databases, you can have first a script that checks the connection with SQL Server. If this succeeds, scripts are run to take the actual backups. If the backups are successfully completed, a script is run to log the info into a table. If the backups fail however, an email is sent to the administrator's team. The following diagram conceptualizes this scenario, where green arrows mean success and red arrows mean failure:

conceptual diagram of ssis package taking backups

Since there are no arrows between the backup tasks, they are executed in parallel.

There are many out-of-the-box tasks in SSIS, such as executing a SQL Script, sending an email, copying files in the file system, executing an .exe file, and so on as Control Flow tasks. However, if something is not included, you can extend functionality by writing your own scripts using .NET.

You can use SSIS any time you want to automate a set of tasks. But why use SSIS when you can for example write everything in T-SQL stored procedures? Or write a solution using PowerShell or .NET? SSIS has a couple of advantages over those solutions:

  • SSIS is mainly a visual tool. Packages are developed in Visual Studio and they use a visual canvas to arrange tasks on the Control Flow tab. This means the solution is more easily understood by someone new to the project. You can just look at the package and see all the dependencies, while with code this is much harder to understand. This is an example of an SSIS package is Visual Studio:
sample package
  • SSIS can easily transfer data between multiple servers. In a T-SQL solution, this is harder as you most likely need linked servers which are not optimal for performance.
  • In SSIS, it's straight forward to run tasks in parallel, greatly optimizing performance. In pure code solutions, this is much harder, if not impossible to do.
  • Some tasks are harder to do in T-SQL than in SSIS, such as using regular expressions, connecting with FTP servers, interacting with files in the file system, looping over files, etc.

ETL vs ELT with SQL Server Integration Services

SSIS is usually referred to as "the ETL tool of Microsoft". But what is ETL exactly? It stands for Extract – Transform – Load. With SSIS, you can load data from multiple types of sources (flat files, Excel files, SQL Server and Oracle relational databases, XML, OLE DB data sources and so on), transform data on-the-fly and write the data to multiple types of destinations.

ETL

An ETL tool is typically used when you have to integrate data from multiple sources into a destination, such as a data warehouse. For this reason, SSIS is also typically considered an important part of business intelligence projects.

In the past few years, the pattern to load data has moved from ETL to ELT, which stands for Extract – Load – Transform. In this scenario, data is extracted from the source, dumped into the destination (which is typically very fast) and then the destination itself will transform the data, not SSIS. For example, data is extracted from various source systems (CRM database, ERP system …) and dumped into staging tables in SQL Server. Then, using T-SQL scripts, the data is transformed as it is loaded into the data warehouse (which is also built in SQL Server). You can apply the same pattern in Azure by using for example Azure Data Lake and Azure Databricks.

ELT pattern

The advantage of ELT is that you take the strengths of the destination into account. The SQL Server database engine is powerful and very well suited to run large analytical SQL statements on large databases. Luckily, SSIS adapts well to an ELT scenario. You can extract and load the data with SSIS and then orchestrate the execution of the T-SQL scripts.

SSIS for the Database Administrator

Not only data warehouse projects benefit from Microsoft SQL Server Integration Services, but developers and administrators can use SSIS to their advantage as well. Any time some tasks related to SQL Server need to be automated, SSIS can be used. It's possible you have used SSIS without even knowing it! For example, the import/export wizard in SQL Server Management Studio (SSMS) uses SSIS behind the scenes.

import export wizard

With this graphical tool, you can import or export data in and out of SQL Server without writing a single line of code. To learn more, check out these tips:

Also, maintenance plans from SSMS are in fact SSIS packages:

maintenance plan

The tip Getting Started with SQL Server Maintenance Plans - Part 1 is an excellent starting point to learn more about this feature.

In SSIS itself, there are many tasks provided for administrative purposes:

admin tasks in SSIS

How do you obtain Integration Services?

SSIS is part of the SQL Server offering. When you install SQL Server, you have the option in the SQL Server set-up to install the SSIS service as well. The following image comes from the tip Installing SQL Server Integration Services.

SSIS Install Screen 8

This has two consequences:

  • To install SSIS, you need a SQL Server license. Depending on the tasks you want to do with SSIS, you either need Enterprise or Standard licensing. In most scenarios, Standard should be sufficient. It also means you need a SQL Server license for every machine you install SSIS on. If you install SSIS on the same machine as the database engine, SSIS "comes free" with the SQL Server license.
  • Since SSIS is part of the SQL Server stack, it is mainly used for data-related projects, such as data integration, data migration, data warehousing, or the management of SQL Server related objects. If you don't have SQL Server and you are looking to automate tasks – such as archiving files – you might want to look at alternative solutions such as PowerShell, since SQL Server licenses don't come cheap.

Administration of SSIS

SSIS has actually two modes of development: the package deployment model and the project deployment model. The first one exists since the release of SSIS in 2005. The last one was introduced in Microsoft SQL Server 2012. In the package deployment model, packages are either stored on the file system or in the msdb database. There's not much administration involved.

In the project deployment model, packages are stored in a centralized database called SSISDB. On top of this database a management layer is built: the SSIS catalog. The catalog integrates management of the SSIS packages and configurations, as well as built-in logging and reporting of the executions. The SSISDB can be managed like any other user database in SQL Server. The tip Understanding the SQL Server Integration Services Catalog and creating the SSISDB Catalog is a great start to learn more.

ssis catalog

For more information about the project deployment model, check out the tips SSIS Project Deployment Model in SQL Server 2012 (Part 1) and (Part 2).

It's also possible to migrate SSIS project to Azure Data Factory. In this case, the SSIS catalog is hosted in either Azure SQL DB or in an Azure SQL Managed Instance. Check out the tip Configure an Azure SQL Server Integration Services Integration Runtime for more info.

Next Steps


Last Updated: 2021-03-16


get scripts

next tip button



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a BI professional, specializing in the Microsoft BI stack with a particular love for SSIS.

View all my tips



Comments For This Article





download





Recommended Reading

Import UTF-8 Unicode Special Characters with SQL Server Integration Services

SSIS Expression Examples for Dates, String Concatenation, Dynamic File Names and More

Import Text and CSV Files into SQL Server Database with SSIS Script Task

Extract, Import and Migrate SSIS Project

Install SQL Server Integration Services in Visual Studio 2019














get free sql tips
agree to terms