By: Daniel Calbimonte | Comments | Related: > Integration Services Performance
Problem
I need to count the number of rows of a SQL Server table with SQL Server Integration Services (SSIS) and pass the parameter to a script. Do you recommend using the SSIS row count functionality or the T-SQL COUNT() function?
On a related note, I need to copy a large table with millions of rows. Do you recommend using a T-SQL INSERT statement or the SSIS multicast task?
Solution
We are going to try both options T-SQL and SSIS to address both the row count and data insertion needs. You will be able to compare the results by yourself!
Requirements
- SQL Server installed.
- SSIS installed.
- SQL Server Data Tools installed (SSDT) or BIDS (Business Intelligence Development Studio).
- You can use SQL Server 2005 or later versions. In this example, we are using SQL Server 2014.
- A SQL Server table (in our example it has a million rows), but you can try with a smaller one.
Getting started
Row Count via SSIS Task vs. T-SQL
- We will try first to count the rows with SSIS and pass the value to a SSIS script task.
- First, we will try with SSDT / BIDS, by creating a new Integration Service Project then dragging and dropping the Data Flow Task and a Script Task:
- Double click on the Data Flow Task and add the OLE DB Source and connect to the Row Count:
- Double click the OLE DB Source and connect to the SQL Server table with a million rows:
- In the menu, go to SSIS variables and create a variable of the type Int32:
- In the Row Count Task in SSIS, select the variable created on step 4. This variable will store the number of rows in the table:
- We will use the Script Task to show the value in a Message Box:
- The following code converts the value of the variable to string and then it shows the value.
- Run the package:
- It will take 13 seconds to count 1 million rows:
- Now we will try to calculate the row count with T-SQL. We will use the Execute SQL Task. We will count the rows using a select count and pass the value to a script task:
- Double click the Execute SQL Task and create a connection to the SQL Server Database and go to the SQL Statement to write a query:
- The following query will allow us to count the identifiers of the table with one million rows and store the value in a variable:
- To store the number of rows in a SSIS variable named TCounter, click on Parameter Mapping and create a new variable named TCounter or any other name you prefer. In direction, select ReturnValue. The Data Type will be LONG and the parameter name will be 0. We will use this variable to store the number of rows obtained on the query of the step 13:
- The script task will be the same that the one done in step 7 and 8.
- Run the package:
- Verify the execution time:
- As you can see, the count row run in 13 seconds and the T-SQL with the count sentence just 1 second. The advantage of the rowcount is that it is easier to use and to pass parameters, but the disadvantage is that it is slower. Another advantage of the rowcount is that it is applicable to any data source like Excel, Oracle, a Flat file, MySQL, SQL Server, etc. while the count function is applicable to SQL Server databases in the T-SQL language.
public void Main()
{
// TODO: Add your code here
string countrows;
countrows = Dts.Variables["User::countrows"].Value.ToString();
MessageBox.Show(countrows);
Dts.TaskResult = (int)ScriptResults.Success;
}
Select ? = count([BusinessEntityID]) FROM [AdventureWorks2014].[dbo].[excelcustomer2]
Multicast in SSIS vs. T-SQL
- The second question is to test Multicast vs. T-SQL. Which one is faster?
- We will try SSIS multicast first. We will create 2 copies of the table using this task.
- In SSDT / BIDS, create a package then drag and drop the Data Flow Task.
- Double click on the Data Flow Task. Create a OLE DB Source as a multicast and two SQL Server destinations. We will copy the table of 1 million rows into 2 different tables using the Multicast. The OLE DB Source will be used to connect to the SQL Server with the table with million of rows. The multicast will be used to create the copies. The SQL Server Destinations will help have the tables where we want to store copies of the data.
- Run the package. Note that the two SQL Server Destinations run in parallel.
- To create 2 copies of the SQL table took 49 seconds. Let's try with T-SQL.
- We will create two T-SQL Statement tasks to copy the data. One task for each destination table. In the SSDT / BIDS drag and drop the Execute T-SQL Statement task.
- Add a statement to insert data from the table with a million rows (excelcustomer2) to a new table (dbo.dest1). In Connection press New. Note that in T-SQL you may need to create the dest1 table manually while the multicast can create it with a simple click.
- Write any connection name and enter the SQL Server Name. Select an authentication mode.
- In the other Execute T-SQL Statement Task, we are going to copy the data from the excelcustomer2 table to the dest2 table.
- Run the package and go to the Execution Results. The execution time is 42 seconds.
insert into AdventureWorks2014.dbo.dest1 select * from AdventureWorks2014.[dbo].[excelcustomer2]
Conclusion
In this new tip, we found that the T-SQL is faster than the SSIS rowcount task. Here we have the results:
Solution | Time |
---|---|
SSIS Row Count Task | 13 seconds |
SQL Count Statement | 1 second |
As you can see, the Row Count is not recommended for big amounts of data. 13 seconds may be acceptable if it is a simple SSIS package, but if it is a complex SSIS package, it is better to optimize the execution time as much as possible. I would recommend Row Count if you cannot find another way to count values or to use it with small tables.
In the second test we compared the multicast vs. T-SQL INSERT statement. Here you have the results:
Solution | Time |
---|---|
SSIS Multicast Task | 49 seconds |
SQL Insert Into Select Statements | 42 seconds |
In this scenario, the insert statement was faster than the multicast task. The multicast has the advantage that it is multi-source. Which means that it can work on any Data Source (Excel, Oracle, MySQL, Access, etc.). If you cannot find a way to copy the file using the Source tools (PL-SQL, Excel functions, etc.) the multicast can help you to reduce your researching time, but the price is the performance. In many cases, the SSIS packages run at night and they take 1-2 hours. If the time is not a problem, the multicast task option maybe acceptable.
Next Steps
There are several links that will be useful to you:
- Row count transformation.
- Multicast transformation
- Count
- Check out all of the SQL Server Integration Services tips.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips