Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Improving data flow performance with SSIS AutoAdjustBufferSize property


By:   |   Read Comments (1)   |   Related Tips: More > Integration Services Performance

Attend these FREE MSSQLTips webcasts >> click to register


Problem

In the new release of SQL Server 2016, a property has been added to the Integration Services (SSIS) data flow: AutoAdjustBufferSize. What is this property and how can I use it?

Solution

SQL Server 2016 Preview

At the time of writing, SQL Server 2016 is still in preview (currently CTP 3.3 has been released). This means functionality or features of Integration Services might change, disappear or be added in the final release.

The AutoAdjustBufferSize Property in SQL Server 2016

This new property has been added to the data flow and you can find it in the Misc. section of the data flow properties as shown below.

Property

The property can only be set to True or False (the default). But what does this property do?

As explained in the tip Improve SSIS data flow buffer performance, it is the size of the buffers. The data flow uses buffers to transfer and transform the data. The bigger the buffer, the more data can be manipulated at once in memory. Most of the time, having bigger buffers results in faster data flow performance. This is dependent though on other factors, such as the speed of the sources and the destinations, and the type of transformations involved.

The size of the buffer is determined by two properties: DefaultBufferMaxRows - with a default of 10,000 rows - and DefaultBufferSize - with a default of 10MB. However, those defaults are very small compared to the amount of memory that modern servers have available. So every time you developed a new SSIS package, you had to bump up the size specifications of the buffer (for example to 40MB and 100,000 rows). But this could lead to a bit of trial and error, since buffers that are too big might slow down performance.

Imagine you have to fill buckets of water and pass them along a line to put out a fire. If you have bigger buckets, you can transfer more water and the fire will be put out more quickly. But if your buckets are too big, it takes too long to fill them with water while the fire is raging on. So you have to find the optimal size of the buckets to balance the amount of water being passed and the time to fill those buckets. The same is true with data flow buffers: you have to find the optimal size of the buffers.

Instead of manually looking for a good buffer size, the AutoAdjustBufferSize property will do this task for you. When enabled, the data flow will automatically change the size of its buffers, in order to find better throughput performance. The other two buffer size properties (DefaultBufferMaxRows and DefaultBufferSize) are then ignored. Let's test to see if using this property actually leads to better performance.

Test SSIS Data Flow Buffer Performance

This tip uses the same test set-up as this tip Improve SSIS data flow buffer performance. You can find all the scripts to create the test tables and test data in that tip. The SSIS package used to test is also the same. I created three versions of the package:

  • one with all the properties set to their defaults (this means AutoAdjustBufferSize is set to False). This package is called AutoBufferSize_Default.dtsx.
  • one with the buffer size set to 20MB and 30,000 rows. This package is called AutoBufferSize_LargerBuffer.dtsx.
  • one with only the property AutoAdjustBufferSize set to True. The name of the package is AutoBufferSize.dtsx.

Using a stored procedure I ran each package 20 times in a row. The creation of this stored procedure is explained in the blog post SSIS Performance Testing. Aside from just logging the start and end time of the package (which allows us to calculate the runtime duration), I also enabled two performance counters.

Perfmon

You can find more information about collecting performance data in the tip Performance Monitor. The buffer memory counter collects the amount of bytes that is allocated in memory by all the packages. Since we are running only one single package at a time, this is the memory consumption of the package being tested. The buffers in use counter indicates how many buffers there are in use. The output of these counters looks like this:

Perfmon output

The Test Results

With the following T-SQL query, we can easily analyze the run durations of the different packages:

SELECT
     [RunID]
    ,[PackageName]
    ,[Duration] = DATEDIFF(MILLISECOND,StartDate,EndDate) / 1000.0
    ,[Mean] = PERCENTILE_CONT(0.5) WITHIN GROUP
    				(ORDER BY DATEDIFF(MILLISECOND,StartDate,EndDate) / 1000.0)
                    OVER (PARTITION BY PackageName)
FROM [dbo].[PackageLogging]
ORDER BY ID;

Visualized in Power BI Desktop, the results look like this:

Durations chart

We can see that enlarging the buffers from the default values to 20MB and 30,000 rows already had a great impact on performance. The median runtime went from just below 50 seconds to about 30 seconds. The new AutoAdjustBufferSize property scored slightly better with a median value of 27 seconds. Using a box and whisker plot, we can see that the runtimes of the two fastest packages are indeed close together. However, the entire set of samples of the AutoAdjustBufferSize package falls clearly below the set of samples of the larger buffer package. The default package obviously performs the worst. It has a quite erratic distribution of runtimes, where the minimum and maximum values are quite spread out.

Box & whiskers

Now let's take a look at the memory usage. By dividing the total memory consumption by the amount of buffers used, we can find the memory per buffer. The X-axis represents the ID of the sample number.

Memory usage

The default package uses the lowest amount of memory, with a buffer size of about 5MB. This means the maximum of rows (DefaultBufferMaxRows) was a limiting factor on the buffer size. Strangely, setting the number of rows to 30,000 and the buffer size to 20MB leads to an average buffer memory of about 10.5MB. Finally, the AutoAdjustBufferSize uses the most memory, with an average buffer size of about 25MB. It's important to note though that using about 2.5 times the amount of memory than the package with the larger buffers results in a gain of only 3 seconds.

Conclusion

The AutoAdjustBufferSize property lives up to its promise: it automatically scales the buffer of the data flow in order to improve performance. The only downside is that this new property is turned off by default.

Next Steps


Last Update:


signup button

next tip button



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a BI professional, specializing in the Microsoft BI stack with a particular love for SSIS.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Wednesday, July 27, 2016 - 6:41:26 AM - Koen Verbeeck Back To Top

UPDATE:

the tip states that the DefaultBufferMaxRows is ignored when AutoAdjustBufferSize is set to true. This is not correct, only DefaultBufferSize is ignored. The DefaultBufferMaxRows is still used to calculate the size of the buffer.


Learn more about SQL Server tools