SSIS Interview Questions for Data Profiling, Protection Level and Package Configuration
When preparing for SQL Server Integration Services interview be sure to understand what questions could be asked in the interview. In this tip series (tip 1, tip 2, tip 3, tip 4), I will try to cover some common Integration Services interview questions to help you prepare for an SSIS interview. In this tip we will cover interview questions related to the data profiling, protection level, package configuration and more.
How do you perform data profile in SQL Server Integration Services? What are the different kinds of data profiling in SSIS?
- Analyzing the data quality has been a challenge for SSIS developers. The SQL Server development team understands this challenge and provided a new Data Profile Task in SQL Server 2008. You can use this task for profiling data during the process of extraction, transformation and loading of data. This new task allows you to save the profile result as XML in an SSIS variable or in a file. The SQL Server development team also provided a Data Profiler Viewer tool which is used to analyze collected profile data.
- The Data Profiling Task allows you to profile data in eight different profiles. Five of these profiles analyze individual columns and the remaining three profiles analyze multiple columns or relationships between columns and tables. For more details click here.
What are the options to generate an SQL Server Integration Services Package?
- There are three different ways to create SSIS packages. First, you can use BIDS (Business Intelligence Development Studio) to create a package in the rich graphical user interface. Second, you can use the Import and Export Wizard, which allows you to create a simple package to move data from the source to destination databases without any transformations. Third, you can use the SSIS API objects to programmatically create and execute packages.
What features does the SQL Server Integration Services Import and Export Wizard provide? Is there anything new in this wizard with SQL Server 2008?
- The Import and Export Wizard allows you to quickly move data between a source and destination database. Although this tool can move data quickly, you cannot perform any sort of transformation other than a data type conversion. Behind the scenes, this wizard generates an SSIS package to transfer data and also allows you to save these packages for later use or to schedule to run on defined time interval.
How do you debug an SSIS package? What is a breakpoint in SQL Server Integration Services? How can you see the data passing through the data flow pipeline?
- While executing a package from the BIDS designer you get an opportunity to stop the package by setting a breakpoint on a particular task, or even on multiple tasks, so that the execution of the package stops at each breakpoint. At this point in time you can verify the values of all the variables for debugging purposes or understand what runtime values are being used.
- The Data Flow Designer in BIDS allows you to create a Data Viewer on the data path which displays the data passing through that path during the package execution.
How can you document your Integration Services package? What is an annotation in an SSIS package?
- To add information about each individual task in SSIS package, you can add text in the package using the annotation. Right click on the design surface and click on Annotation to write text at that location.
What an SSIS Lookup Transformation and what new features did SQL Server 2008 deliver?
- Often times you need to pull incremental data from a source database and merge it in a destination table. You can use the T-SQL Merge command to merge this delta data from your staging table to main table, if you store your data in staging table before load, or use the Lookup transformation which allows you to merge data on the fly, in-memory, to the destination table. The lookup transformation performs a join between the source and reference table/view. With SQL server 2008, this transformation has been further enhanced to output rows for matched records, rows for unmatched records and error output. Apart from that, the enhancement also includes the ability to cache your referenced data so that you can reuse it multiple times instead of querying it every time. For more information click here.
What is Change Data Capture (CDC)?
- Change Data Capture (CDC) is a new feature in SQL Server 2008 and later versions to capture all of the modifications (insert, update and delete) made to a particular table on which the CDC feature has been enabled. Once enabled, CDC will capture all these modifications into a separate table in an asynchronous manner. SQL Server reads the transaction log for the changes to a CDC enabled table and records the changes in another table with a similar structure along with the meta information about the kind of changes. Learn more about Change Data Capture.
What is the new command in SQL Server 2008 for UPSERT functionality?
- Beginning with SQL Server 2008, you can use the MERGE T-SQL command to perform these Insert, Update and Delete operations in a single statement. This new command is similar to the UPSERT (fusion of the words UPDATE and INSERT) command in Oracle. This command inserts rows that do not exist, updates the rows that do exist and deletes the rows which exists in destination, but not in source. With the introduction of the MERGE T-SQL command, developers can more effectively handle common data warehousing scenarios, like checking whether a row exists and then executing an insert or update or delete.
- The MERGE statement basically merges data from a source result set to a target table based on a condition that you specify and if the data from the source already exists in the target or not. The new T-SQL command combines the sequence of conditional INSERT, UPDATE and DELETE commands in a single atomic statement, depending on the existence of a record. Learn more about the Merge command.
What are the different SQL Server Integration Services Protection Levels and what do they mean?
- As part of the package information, some sensitive information like connection stings, user ids, passwords, etc. are stored to execute the package. To protect this sensitive information, SSIS provides a mechanism to encrypt the data. If you have some variables which contain sensitive information you can set the Sensitive property of that variable to TRUE to include them for encryption. You can set different level of protection using a package property called ProtectionLevel.
- EncryptSensitiveWithUserKey - Encrypts sensitive information with the user/developer profile and is the default protection level.
- EncryptSensitiveWithPassword - Encrypts sensitive information with a password, you will need to provide a password during development for encryption and decryption of the sensitive information. When you or someone else opens/runs the package, she/he needs to provide the same password again for decryption.
- EncryptAllWithUserKey - Encrypts all information (including sensitive) with the user/developer profile.
- EncryptAllWithPassword - Encrypts all information (including sensitive) with a password, you will need to provide password during development for encryption and decryption of these sensitive information. When you or someone else opens/runs the package, she/he needs to provide the same password again for decryption.
- DontSaveSensitive - It instructs SSIS to not store the sensitive information inside the package and let user provide them again when they are opening/executing it again.
- ServerStorage - Protects the whole package using SQL Server database roles and this protection level is only applicable if you are storing the package in the msdb database.
- Learn more about the SQL Server Integration Services Protection Levels..
How can you configure your Integration Services package to run with a different set of parameters like credentials, connection strings, etc. during runtime without modifying the package?
- As an SSIS package is tested and deployed a few different teams will be accessing the package with different sets of credentials, connecting to different sources and destinations, etc. So now the question is how we can configure the package with different set of values and without actually modifying the package. SSIS has very robust mechanism to configure your package at runtime and those runtime values can come from different sources as follows:
- Environment variables
- XML configuration file
- A SQL Server table
- Parent package variable
- The SSIS package configuration lets you configure your package dynamically with different sets of parameter values or new values. Learn more about SQL Server Integration Services Package Configuration.
What are the different SSIS maintenance tasks?
- Apart from different tasks used for ETL (Extraction, Transformation and Loading) SSIS also provides several built-in maintenance tasks which can be used for repetitive administrative tasks for a SQL Server instance like backup, restore, index rebuild, etc.
How can you have an SSIS package run on a defined interval?
- You can create a SQL Server Agent Job, define your job step to execute an SSIS package, pass runtime values from the job to an SSIS package and define a job schedule to execute as needed. By default, SSIS packages get executed under the security context of SQL Server Agent service account, if you want the package with a different account, you can create an SSIS proxy account.
- Check out the following resources as you prepare for your next interview:
About the author
View all my tips
Article Last Updated: 2012-02-03