SSIS Interview Questions for Memory Management and Deployment

By:   |   Comments (4)   |   Related: More > Professional Development Interview Questions BI


Problem

Are you an aspiring SQL Server Integration Services Professional?  Do you need to prepare for an upcoming SSIS interview?  Check out this tip series (tip 1, tip 2, tip 3, tip 4) on interview questions and this tip related to SQL Server Integration Services memory management and package deployment. 

Solution

What are the different properties which control the number of rows to be stored in the data buffers?

  • Each Data Flow Task has two properties (DefaultMaxBufferRows and DefaultMaxBufferSize) which control the number of rows which can be stored in single buffer and how many buffers can be created. The size of each row plays a pivotal role in deciding the number of rows which can be stored in a single buffer. DefaultMaxBufferRows property has default value of 10,000 rows and specifies the maximum number of rows that can be stored in each individual buffer.
  • The DefaultBufferSize property has a default value of 10 MB and specifies the maximum size of each individual buffer. Please note maximum and minimum size of each individual buffer is also constrained by two internal properties: MinBufferSize = 64 KB and MaxBufferSize = 100MB. As a best practice, you should configure these properties in such a way, that SSIS creates a small number of large buffers instead of creating a large number of small buffers. Also keep in mind, you should not make these values too large, especially if you minimal memory on the system, to a point where SSIS starts swapping/spooling data to disk.  For more information click here.

How does BLOB data get stored in the data buffer and what happens if SSIS does not find enough buffers to hold the BLOB data?

  • As long as the BLOB data can fit in the data buffers (created out of memory) it is stored in the data buffers. However, the size of BLOB data is often too large to fit in data buffers, so in this circumstance SSIS spools the BLOB data to the file system at the default location or at a specified location.
  • Any data, including BLOB data,  which cannot be stored in the data buffers, because of memory scarcity or memory pressure, is spooled to the file system.

When and where are data buffers are spooled to disk? How can this issue be avoided?

  • Often times, the size of BLOB data is too large to fit in the data buffers, so SSIS spools the BLOB data to the file system at the default location or at a specified location. However, anytime data cannot be stored in the data buffers because of memory scarcity or memory pressure, the data is spooled to the file system.
  • By default SSIS derives the physical locations to store buffer overflow data and BLOB overflow data from the TEMP/TMP environment variable.  You can change this default behavior by changing these two properties of data flow task:
    • BufferTempStoragePath
    • BLOBTempStoragePath.
  • Note - sometimes spooling causes your system to run slow because of the input/output involved. As such,  you should try to optimize the data in data buffers by adding more RAM or removing asynchronous transformations to the extent possible. If this is not possible, then try to specify the physical location for the spooling to the fastest drives that you have available. You can learn more this here.

What is parallel processing in SSIS and how do you control it?

  • One of the biggest benefits of using SSIS is it's ability to process data in parallel. You can have multiple tasks run in parallel for your ETL operations and utilize the power of your today's multi-core/multi-processer machines.
  • Each SSIS package has MaxConcurrentExecutables property which has its default value as -1; which means the maximum number of tasks that can be executed is equal to the total number of processors on the machine plus two. For example, on a four processor machine can execute 6 tasks in parallel in a package with this default property value. You can change this default value to an absolute number to indicate absolute number of tasks that can be executed in parallel.
  • Each Data flow task has the EngineThreads property which specifies the total number of threads that can be created for executing the data flow task. In SQL Server 2008, the default value for this property is 10, which means 10 threads can be allocated at one time to execute different execution trees of the data flow task in parallel.

Which languages are available to write code for the Script Task or Script Component?

  • In SQL Server 2005, the only allowed language for the  for the Script Task or Script Component was Visual Basic. Starting with SQL Server 2008 you can use either Visual Basic .NET or C# to write code for the Script Task or Script Component. You also have access to the .NET base class library.

Can you explain the differences between the Script Task vs. the Script Component?

  • The purpose of the Script Task and Script Component is to extend the functionality of SSIS by writing your own scripts if no built-in task or transformation meets your needs. Though the purpose is same for both items, where these tasks are used is different.
    • The Script Task is used in control flow to accomplish almost any general-purpose task, like connecting to web services, sending out HTML formatted mails, etc.
    • The Script Component is used in the data flow designer to extend the functionality of the source, transformation and destination and typically runs for each row. For example, you can use Script Component to generate the hash of each incoming row and pass the value throughout the package. To learn more about the tasks click here.

Can you explain the difference between the Multicast Transformation vs. the Conditional Split Transformation?

  • The basic difference between Multicast Transformation and Conditional Split transformation is that Multicast Transformation generates exact copies of the source/incoming data, it means each recipient will have same number of records as the source. The Conditional Split Transformation divides the source/incoming data on the basis of defined conditions and if no rows match with the defined conditions those rows are the default output, to learn about it with an example refer to the SSIS Multicast Transformation vs. Conditional Split Transformation tip.

I know SSIS has the built-in Send Mail task for sending out mails, but does that task also allow HTML formatted mails to be sent?

  • The Send Mail task is used to send out plan text email only. For HTML formatted mails, there is no standard built-in task, but you can script this functionality in a Script Task and achieve the same results.  To learn about how we can create/write a script task with an example refer to the Sending HTML Formatted Email in SQL Server using-the SSIS Script Task tip.

What are different command line utilities in SSIS?

  • There are basically two command line utilities: DTUTIL (used for managing/moving packages) and DTEXEC (used for executing packages). There are two variants of DTEXEC. One is used as purely a command line utility and all the configuration information is passed on the command prompt as switches. The second variant is DTEXECUI which provides a GUI for configuring package before execution and monitoring the package execution. To learn more about these tools refer to these resources:

What are the different options for SSIS package deployment?

Next Steps

Review the following resources for your next SQL Server Integration Services interview:

Check out the previous tips in this series:



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Arshad Ali Arshad Ali is a SQL and BI Developer focusing on Data Warehousing projects for Microsoft.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Tuesday, January 17, 2012 - 7:35:49 AM - Jeremy Kadlec Back To Top (15677)

Arshad,

Thank you for your feedback.

Thank you,
Jeremy Kadlec


Tuesday, January 17, 2012 - 1:10:35 AM - Arshad Back To Top (15675)

Well I would say all these questions are equally important to know and understand about because they cover different areas:
For example, say your package is not performing as expected and while troubleshooting you identified that data is getting spooled to the disk during data integration. Now the questions are what caused this data spooling, how can we avoid it and if not then how can we minimize it or make it perform better?

In another scenario, say as a SSIS developer you are done with development and you have to give these packages to Operation team for deployment on UAT and Production, then what are different ways you can let operation team deploy these packages, how can you automate the process of deployment, configuration and management using the command line utilities?

Hope it helps! If you have any specific doubt do let me know, I will try to answer it.

I would highly recommend to review my earlier tips on SSIS best practices and design considerations: http://www.mssqltips.com/sqlservertip/1840/sql-server-integration-services-ssis-best-practices/


Monday, January 16, 2012 - 4:03:47 PM - Jeremy Kadlec Back To Top (15665)

Arshad,

For the questions you have posted in this tip, which one do you think SQL Server Professionals struggle with the most and why?

Thank you,
Jeremy Kadlec


Friday, January 13, 2012 - 9:15:38 AM - Javier Back To Top (15641)

Great info...















get free sql tips
agree to terms