Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Scripts          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

SQL Server Integration Services Performance Comparison for Row Counts and Data Imports


By:   |   Updated: 2016-05-03   |   Comments   |   Related: More > Integration Services Performance

Problem

I need to count the number of rows of a SQL Server table with SQL Server Integration Services (SSIS) and pass the parameter to a script. Do you recommend using the SSIS row count functionality or the T-SQL COUNT() function?

On a related note, I need to copy a large table with millions of rows. Do you recommend using a T-SQL INSERT statement or the SSIS multicast task?

Solution

We are going to try both options T-SQL and SSIS to address both the row count and data insertion needs. You will be able to compare the results by yourself!

Requirements

  1. SQL Server installed.
  2. SSIS installed.
  3. SQL Server Data Tools installed (SSDT) or BIDS (Business Intelligence Development Studio).
  4. You can use SQL Server 2005 or later versions. In this example, we are using SQL Server 2014.
  5. A SQL Server table (in our example it has a million rows), but you can try with a smaller one.

Getting started

Row Count via SSIS Task vs. T-SQL

  1. We will try first to count the rows with SSIS and pass the value to a SSIS script task.
  2. First, we will try with SSDT / BIDS, by creating a new Integration Service Project then dragging and dropping the Data Flow Task and a Script Task:

  3. Data Flow Task and Script Task

  4. Double click on the Data Flow Task and add the OLE DB Source and connect to the Row Count:


  5. OLE DB Source and Row Count

  6. Double click the OLE DB Source and connect to the SQL Server table with a million rows:


  7. OLE DB Source Editor

  8. In the menu, go to SSIS variables and create a variable of the type Int32:


  9. SSIS Variables

  10. In the Row Count Task in SSIS, select the variable created on step 4. This variable will store the number of rows in the table:


  11. Assign the Row Count to the User::Countrows variable

  12. We will use the Script Task to show the value in a Message Box:


  13. Script Task Editor

  14. The following code converts the value of the variable to string and then it shows the value.


  15. The following code converts the value of the variable to string and then it shows the value.

    public void Main()
    {
    // TODO: Add your code here
    string countrows;
    countrows = Dts.Variables["User::countrows"].Value.ToString();
    MessageBox.Show(countrows);
    Dts.TaskResult = (int)ScriptResults.Success;
    }
  16. Run the package:
  17. Run the SSIS Package
  18. It will take 13 seconds to count 1 million rows:
  19. Review the SSIS Task Elapsed Time

  20. Now we will try to calculate the row count with T-SQL.  We will use the Execute SQL Task. We will count the rows using a select count and pass the value to a script task:


  21. Calculate the row count with T-SQL

  22. Double click the Execute SQL Task and create a connection to the SQL Server Database and go to the SQL Statement to write a query:


  23. Execute SQL Task to capture the row count

  24. The following query will allow us to count the identifiers of the table with one million rows and store the value in a variable:


  25. Capture the row count in T-SQL code


    Select ? = count([BusinessEntityID]) 
    FROM [AdventureWorks2014].[dbo].[excelcustomer2]
  26. To store the number of rows in a SSIS variable named TCounter, click on Parameter Mapping and create a new variable named TCounter or any other name you prefer. In direction, select ReturnValue. The Data Type will be LONG and the parameter name will be 0. We will use this variable to store the number of rows obtained on the query of the step 13:


  27. variable to store the number of rows obtained on the query
  28. The script task will be the same that the one done in step 7 and 8.
  29. Run the package:
  30. Run the SSIS Package
  31. Verify the execution time:
  32. Elapsed time


  33. As you can see, the count row run in 13 seconds and the T-SQL with the count sentence just 1 second. The advantage of the rowcount is that it is easier to use and to pass parameters, but the disadvantage is that it is slower. Another advantage of the rowcount is that it is applicable to any data source like Excel, Oracle, a Flat file, MySQL, SQL Server, etc. while the count function is applicable to SQL Server databases in the T-SQL language.

Multicast in SSIS vs. T-SQL

  1. The second question is to test Multicast vs. T-SQL. Which one is faster?
  2. We will try SSIS multicast first. We will create 2 copies of the table using this task.
  3. In SSDT / BIDS, create a package then drag and drop the Data Flow Task.
  4. Data Flow Task for the Multicast


  5. Double click on the Data Flow Task. Create a OLE DB Source as a multicast and two SQL Server destinations. We will copy the table of 1 million rows into 2 different tables using the Multicast. The OLE DB Source will be used to connect to the SQL Server with the table with million of rows. The multicast will be used to create the copies. The SQL Server Destinations will help have the tables where we want to store copies of the data.

  6. Data Flow Task with a single source, multicast and two destinations


  7. Run the package. Note that the two SQL Server Destinations run in parallel.
  8. Elapsed time to run the SSIS package


  9. To create 2 copies of the SQL table took 49 seconds. Let's try with T-SQL.
  10. We will create two T-SQL Statement tasks to copy the data. One task for each destination table. In the SSDT / BIDS drag and drop the Execute T-SQL Statement task.


  11. Execute T-SQL Statement


  12. Add a statement to insert data from the table with a million rows (excelcustomer2) to a new table (dbo.dest1). In Connection press New. Note that in T-SQL you may need to create the dest1 table manually while the multicast can create it with a simple click.
  13. Execute T-SQL Statement Task


    insert into AdventureWorks2014.dbo.dest1
    select * from AdventureWorks2014.[dbo].[excelcustomer2]
  14. Write any connection name and enter the SQL Server Name. Select an authentication mode.


  15. Connection Properties


  16. In the other Execute T-SQL Statement Task, we are going to copy the data from the excelcustomer2 table to the dest2 table.


  17. Execute T-SQL Statement Task


  18. Run the package and go to the Execution Results. The execution time is 42 seconds.


  19. Elapsed Time for the SSIS Package

Conclusion

In this new tip, we found that the T-SQL is faster than the SSIS rowcount task. Here we have the results:

Solution Time
SSIS Row Count Task 13 seconds
SQL Count Statement 1 second

As you can see, the Row Count is not recommended for big amounts of data. 13 seconds may be acceptable if it is a simple SSIS package, but if it is a complex SSIS package, it is better to optimize the execution time as much as possible. I would recommend Row Count if you cannot find another way to count values or to use it with small tables.

In the second test we compared the multicast vs. T-SQL INSERT statement. Here you have the results:

Solution Time
SSIS Multicast Task 49 seconds
SQL Insert Into Select Statements 42 seconds

In this scenario, the insert statement was faster than the multicast task. The multicast has the advantage that it is multi-source. Which means that it can work on any Data Source (Excel, Oracle, MySQL, Access, etc.). If you cannot find a way to copy the file using the Source tools (PL-SQL, Excel functions, etc.) the multicast can help you to reduce your researching time, but the price is the performance. In many cases, the SSIS packages run at night and they take 1-2 hours. If the time is not a problem, the multicast task option maybe acceptable.

Next Steps

There are several links that will be useful to you:



Last Updated: 2016-05-03


get scripts

next tip button



About the author
MSSQLTips author Daniel Calbimonte Daniel Calbimonte is a Microsoft SQL Server MVP, Microsoft Certified Trainer and Microsoft Certified IT Professional.

View all my tips




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Learn more about SQL Server tools