SSIS Interview Questions for Memory Management and Deployment
By: Arshad Ali | Comments (4) | Related: More > Professional Development Interview Questions BI
Are you an aspiring SQL Server Integration Services Professional? Do you need to prepare for an upcoming SSIS interview? Check out this tip series (tip 1, tip 2, tip 3, tip 4) on interview questions and this tip related to SQL Server Integration Services memory management and package deployment.
What are the different properties which control the number of rows to be stored in the data buffers?
- Each Data Flow Task has two properties (DefaultMaxBufferRows and DefaultMaxBufferSize) which control the number of rows which can be stored in single buffer and how many buffers can be created. The size of each row plays a pivotal role in deciding the number of rows which can be stored in a single buffer. DefaultMaxBufferRows property has default value of 10,000 rows and specifies the maximum number of rows that can be stored in each individual buffer.
- The DefaultBufferSize property has a default value of 10 MB and specifies the maximum size of each individual buffer. Please note maximum and minimum size of each individual buffer is also constrained by two internal properties: MinBufferSize = 64 KB and MaxBufferSize = 100MB. As a best practice, you should configure these properties in such a way, that SSIS creates a small number of large buffers instead of creating a large number of small buffers. Also keep in mind, you should not make these values too large, especially if you minimal memory on the system, to a point where SSIS starts swapping/spooling data to disk. For more information click here.
How does BLOB data get stored in the data buffer and what happens if SSIS does not find enough buffers to hold the BLOB data?
- As long as the BLOB data can fit in the data buffers (created out of memory) it is stored in the data buffers. However, the size of BLOB data is often too large to fit in data buffers, so in this circumstance SSIS spools the BLOB data to the file system at the default location or at a specified location.
- Any data, including BLOB data, which cannot be stored in the data buffers, because of memory scarcity or memory pressure, is spooled to the file system.
When and where are data buffers are spooled to disk? How can this issue be avoided?
- Often times, the size of BLOB data is too large to fit in the data buffers, so SSIS spools the BLOB data to the file system at the default location or at a specified location. However, anytime data cannot be stored in the data buffers because of memory scarcity or memory pressure, the data is spooled to the file system.
- By default SSIS derives the physical locations to store buffer overflow data and BLOB overflow data from the TEMP/TMP environment variable. You can change this default behavior by changing these two properties of data flow task:
- Note - sometimes spooling causes your system to run slow because of the input/output involved. As such, you should try to optimize the data in data buffers by adding more RAM or removing asynchronous transformations to the extent possible. If this is not possible, then try to specify the physical location for the spooling to the fastest drives that you have available. You can learn more this here.
What is parallel processing in SSIS and how do you control it?
- One of the biggest benefits of using SSIS is it's ability to process data in parallel. You can have multiple tasks run in parallel for your ETL operations and utilize the power of your today's multi-core/multi-processer machines.
- Each SSIS package has MaxConcurrentExecutables property which has its default value as -1; which means the maximum number of tasks that can be executed is equal to the total number of processors on the machine plus two. For example, on a four processor machine can execute 6 tasks in parallel in a package with this default property value. You can change this default value to an absolute number to indicate absolute number of tasks that can be executed in parallel.
- Each Data flow task has the EngineThreads property which specifies the total number of threads that can be created for executing the data flow task. In SQL Server 2008, the default value for this property is 10, which means 10 threads can be allocated at one time to execute different execution trees of the data flow task in parallel.
Which languages are available to write code for the Script Task or Script Component?
- In SQL Server 2005, the only allowed language for the for the Script Task or Script Component was Visual Basic. Starting with SQL Server 2008 you can use either Visual Basic .NET or C# to write code for the Script Task or Script Component. You also have access to the .NET base class library.
Can you explain the differences between the Script Task vs. the Script Component?
- The purpose of the Script Task and Script Component is to extend the functionality of SSIS by writing your own scripts if no built-in task or transformation meets your needs. Though the purpose is same for both items, where these tasks are used is different.
- The Script Task is used in control flow to accomplish almost any general-purpose task, like connecting to web services, sending out HTML formatted mails, etc.
- The Script Component is used in the data flow designer to extend the functionality of the source, transformation and destination and typically runs for each row. For example, you can use Script Component to generate the hash of each incoming row and pass the value throughout the package. To learn more about the tasks click here.
Can you explain the difference between the Multicast Transformation vs. the Conditional Split Transformation?
- The basic difference between Multicast Transformation and Conditional Split transformation is that Multicast Transformation generates exact copies of the source/incoming data, it means each recipient will have same number of records as the source. The Conditional Split Transformation divides the source/incoming data on the basis of defined conditions and if no rows match with the defined conditions those rows are the default output, to learn about it with an example refer to the SSIS Multicast Transformation vs. Conditional Split Transformation tip.
I know SSIS has the built-in Send Mail task for sending out mails, but does that task also allow HTML formatted mails to be sent?
- The Send Mail task is used to send out plan text email only. For HTML formatted mails, there is no standard built-in task, but you can script this functionality in a Script Task and achieve the same results. To learn about how we can create/write a script task with an example refer to the Sending HTML Formatted Email in SQL Server using-the SSIS Script Task tip.
What are different command line utilities in SSIS?
- There are basically two command line utilities: DTUTIL (used for managing/moving packages) and DTEXEC (used for executing packages). There are two variants of DTEXEC. One is used as purely a command line utility and all the configuration information is passed on the command prompt as switches. The second variant is DTEXECUI which provides a GUI for configuring package before execution and monitoring the package execution. To learn more about these tools refer to these resources:
What are the different options for SSIS package deployment?
- There are several ways you can deploy your SSIS package to either a File, the SSIS Package store or MSDB:
- DTUTIL command line utility
- Connect to the Integration Services instance in SSMS and Import a package
- Create an SSIS deployment manifest from your SSIS project and use the SSIS Deployment Wizard to install all the packages of the project together
- Create a tool by creating some of the undocumented stored procedures for deploying the packages, you can refer to this tip for a list of some of these stored procedures
Review the following resources for your next SQL Server Integration Services interview:
- SQL Server Integration Services Tutorial
- SQL Server Integration Services SSIS Best Practices tips
- Developer Interview Questions
Check out the previous tips in this series:
- SSIS Interview Questions - Part 1
- SSIS Interview Questions for Basic Concepts and Event Logging
- SSIS Interview Questions on Transactions, Event Handling and Validation
- SSIS Architecture and Internals Interview Questions
About the author
Arshad Ali is a SQL and BI Developer focusing on Data Warehousing projects for Microsoft.
View all my tips
View all my tips