SSIS Interview Questions and Answers for Developers and DBAs
Many articles that provide SSIS interview questions and answers only contain questions about specific SSIS components and miss some critical information that should be acquired by an SSIS development/administration (DBA) job applicant. In this tutorial, we look at many questions that could come up during an interview related to Microsoft SQL Server Integration Services (SSIS).
This article provides more than 50 questions that can be used within an SSIS job interview. The questions are classified into four categories.
General Knowledge Questions
What is ETL?
ETL stands for Extract, Transform, and Load. It is a three-phase process that extracts data from different sources, applies some transformations, and loads it into a destination repository (database, data warehouse…).
What is SSIS?
SSIS stands for SQL Server Integration Services. It is the official ETL tool in SQL Server.
What is the difference between a database, a data warehouse, and a data lake?
A database is a repository that stores structured data used for transaction and retrieval operations. A data warehouse is a structured data repository used for analytical purposes. A data lake is a data repository where data is stored in its raw format and can later be loaded into a data warehouse or searched using advanced techniques.
When are ETL jobs usually executed?
After working hours / at midnight.
How are ETL jobs scheduled in SQL Server?
Using SQL Server Agent Jobs.
What is the difference between ELT and ELT?
ETL transforms data before loading it on the server, while ELT transforms it afterward. ELT is more suitable for larger volumes of data.
What is the difference between relational and non-relational databases?
In a relational database, data is structured and stored within tables. Moreover, relations are modeled using primary/foreign key relations. In contrast, non-relational databases support semi-structured and unstructured data.
What are lookup tables?
Lookup tables are commonly used for standardizing and unifying values within a text column, ensuring data integrity and improving data load performance. The values within a data table are replaced with a code that points to the value stored within a lookup table.
What does data staging mean?
It is a location where data is stored before being modified and exported to the destination data repository (database/data warehouse)
What is data cleaning?
It is a set of processes and transformations to remove erroneous values from the data to make it more readable and consumable by the end-user by guaranteeing an acceptable data quality level.
What is data wrangling?
Data wrangling focuses on changing the data format to make it usable within an analytical context.
SSIS Development Questions
What is the difference between a control flow and a data flow?
Control flow and data flow are two execution levels in SSIS. A data flow defines a flow from a data source to a destination. In contrast, control flow defines a higher level of workflow of tasks to be executed. A control flow can contain one or more data flow tasks.
What is the difference between an SSIS task and a component?
The SSIS task is executed on the package control flow level, while the SSIS component is executed on the data flow level.
What SSIS objects are required to connect to a data source?
A connection manager is required to establish a connection to a data source.
What are SSIS containers?
Containers are objects that provide structure to packages and services to tasks. It enables the repetition of control flows in packages and the grouping of tasks and containers into meaningful work units. Containers can include other containers in addition to tasks. There are four types of containers in SSIS:
- For loop container
- For each loop container
- Sequence container
- Task host container
Official documentation: Integration Services Containers
What is a precedence constraint?
As part of the control flow, precedence constraints link executables, containers, and tasks in a package and specify conditions for executing executables.
Official documentation: Precedence Constraints
What are the data flow components types?
There are three components available in the SSIS data flow:
- Source components
- Transformation components
- Destination components
What are SSIS expressions?
Expressions can be used to define conditional statements, create and update values in data columns, assign values to variables, update or populate properties at run time, define constraints in precedence constraints, and provide the expressions used by containers.
Official documentation: Integration Services (SSIS) Expressions
What is the difference between SSIS variables and parameters?
Parameters store a value that should be configured before the package execution. They are read-only during the execution of a package. In contrast, variables can be modified during the package execution.
For more information: SSIS Parameters vs. Variables
How do you add a computed column into the data pipeline?
We should use a derived column transformation.
What are the SSIS objects that can handle XML data?
Two main SSIS objects can handle XML data type:
- XML task: used to perform several operations unrelated to the data import process, such as schema validation.
- XML source: used to read data from an XML file.
How do you convert a column's data type in SSIS?
There are several ways to convert a columns data type:
- Changing the data type from the source advanced editor
- Using a data conversion transformation
- Using a derived column transformation
How do you extend SSIS functionalities?
Two objects allow extending the SSIS functionalities by writing customized C# or VB.NET scripts:
- Script Task: execute a script on the control flow level.
- Script Component: execute a script within a data flow task.
Is SSIS able to continue the package execution from the last successful step before a failure?
Yes, this can be done using Checkpoint.
Official documentation: Restart Packages by Using Checkpoints
What is the difference between the merge and merge join transformations?
The Merge transformation is similar to the UNION statement in SQL. In comparison, the Merge Join transformation is similar to the JOIN statement.
What is the difference between the execute SQL task and execute T-SQL statement task?
Execute SQL Task allows the user to execute statements over SQL Server and other supported sources. In contrast, the Execute T-SQL statement task is only dedicated to SQL Server (ADO.net) connections.
What is the difference between the OLE DB destination and the SQL Server destination?
OLE DB destination supports more DBMS than SQL Server and can insert data into a remote SQL Server. In contrast, SQL Server destination only supports local SQL Server database and has more restrictions.
For more information: SQL Server Destination vs. OLE DB Destination - Stack Overflow
When should we use raw files?
Raw files are binary files supported in SSIS. They are meant to increase the performance of data staging operations compared to flat files.
For more information: SSIS Raw File Source and Destination Example
Does SSIS have the ability to connect to the Hadoop ecosystem?
There are several components added in SQL Server 2016 that allow SSIS to read data and execute commands on a Hadoop cluster:
- Hadoop connection manager
- HDFS source and destination
- Pig Task
- Hive Task
Official documentation: Hadoop and HDFS Support in Integration Services (SSIS)
How can SSIS be integrated with Azure?
The SQL Server Integration Services (SSIS) Feature Pack for Azure provides several components that give the ability to connect to Azure services, transfer data between Azure and on-premises data sources, and process data stored in Azure.
Official documentation: Azure Feature Pack for Integration Services (SSIS)
How do you distribute a data pipeline?
We can distribute the data pipeline using the Balanced Data Distributor component.
Official documentation: Balanced Data Distributor Transformation
What are event handlers?
SSIS Event handlers can be considered a separate package that operates asynchronously once an event occurs.
Official documentation: Integration Services (SSIS) Event Handlers
How do you log SSIS events?
SSIS includes several log providers that allow storing the package logs within an external file or a SQL database.
Official documentation: Integration Services (SSIS) Logging
What is the difference between OLE DB, ODBC, and ADO objects in SSIS?
In brief, OLE DB, ODBC, and ADO.NET are data access providers that allow SSIS to connect with different data sources. In addition to the SQL syntax differences, ADO.NET and OLE DB can access the same data sources. In contrast, ODBC can only access relational database systems, and they cannot access XML-based sources, Microsoft Exchange Server, and others.
How do you load data as batches from a SQL database?
We can use a for-loop container to specify the row count per batch and use an OLE DB Source with a SQL Command data access mode to read the data using an SQL OFFSET FETCH command.
For more information: Reading Huge volume of data from SQLite to SQL Server fails at pre-execute
How do you loop over values retrieved from a SQL table?
We can use an Execute SQL Task to store the SQL query result within a result set object. Then we can loop over the resultset rows using a for each loop container.
How do you loop over files stored in a specific directory?
We can use a for each loop container to loop over text files stored within a directory.
What components are used to read open data?
We can use the SSIS ODATA connection manager and ODATA source to read data from an open data provider.
For more information: Using the OData Source in SSIS
When should we use bulk insert task instead of data flow task to import data into SQL Server?
Bulk Insert tasks are more performant than data flow tasks and are more suitable for ELT pipelines. But they only support text files and requires some additional permission. In comparison, data Flow tasks are more suitable for ETL pipelines.
For more information: SSIS Load Staging Data with Data Flow Task or Bulk Insert Task
What feature allows SSIS tasks to be reused over several packages?
SSIS Package Parts.
Official documentation: Reuse Control Flow across Packages by Using Control Flow Package Parts
How do you adjust a data pipeline buffer size?
There are three data flow task properties used to adjust the data pipeline buffer size:
For more information: SSIS Performance Tuning When Designing ETL Pipelines
SSIS Administration Questions
What is SSISDB?
The SSISDB catalog is the central point for executing and managing Integration Services (SSIS) projects you've deployed to the Integration Services server.
Official documentation: SSIS Catalog - SQL Server Integration Services (SSIS)
What are the different options to save an SSIS package?
There are three options:
- Standalone .dtsx file
- SQL Server msdb database
- SSIS catalog
What are the different SSIS package deployment configurations?
There are two deployment configurations:
- Single package deployment
- Project deployment
Can the SSIS package be deployed to a previous version of SQL Server?
Yes, by adjusting the TargetServerVersion property in the SSIS project configuration.
Can SSIS packages be scaled over several servers?
Yes, Using SSIS Scale Out.
Official documentation: SQL Server Integration Services (SSIS) Scale Out.
Are there any SSIS package development automation tools?
Several tools can automate the SSIS package creation. For example:
- Integration Services Object model
- EzApi class library
For more information: SQL Server Integration Services packages automation tools
Are there any SSIS DevOps services?
Microsoft provides two SSIS DevOps tools:
- SSISBuild: automates building SSIS packages
- SSISDeploy: automates the deployment of the packages.
Official documentation: Standalone SQL Server Integration Services (SSIS) DevOps Tools
What is a .ispac file?
The .ispac is the deployment file extension of an SSIS project.
SSIS Online Communities Questions
While many people may find the question below out of the scope of an SSIS interview, being active in an online SSIS community should be a required skill of the job applicant.
What are the active SSIS online communities?
Three of the most popular communities are:
What are the popular free SSIS extensions?
An example of popular free SSIS extensions is:
- SSIS Community Samples: This project contains Integration Services samples and components created by the product team.
- BI developer extensions: A Visual Studio extension that enhances development functionality in Business Intelligence Development Studio (BIDS) and SQL Server Data Tools (SSDT).
- SSIS JSON Source component: An open-source SSIS component that allows reading data from JSON.
- SSIS RabbitMQ component: Custom SSIS Components for RabbitMQ
What are the leading SSIS components providers in the Visual Studio marketplace?
As an example of the leading companies that develop custom SSIS tasks and components:
If you are new to SSIS, take a look at the SSIS tutorial: SQL Server Integration Services Tutorial
Feel free to check the following series published on MSSQLTips, which illustrates SSIS performance optimization techniques and the bad habits that you should avoid:
- SSIS Bad Habits: Decreasing SSIS Package Performance
- SSIS Bad Habits: Inefficient Data Staging Strategies
- SSIS Bad Habits: Wrong OLE DB components configuration
- SSIS Bad Habits: Inefficient data staging strategies - Part 2
- SSIS Bad Habits: Slow data transformations lookup
- SSIS Bad Habits: Using Sort and Merge Join Components
About the author
View all my tips
Article Last Updated: 2023-02-03