Learn more about SQL Server tools

   
   






















































Connect with MSSQLTips




MSSQLTips Facebook Page MSSQLTips LinkedIn Page MSSQLTips RSS Feed MSSQLTips Pinterest Page MSSQLTips Twitter Page MSSQLTips Google+ Page

Making data imports into SQL Server as fast as possible

MSSQLTips author Greg Robidoux By:   |   Read Comments (2)   |   Related Tips: More > Integration Services Performance

Problem
When bulk loading data into SQL Server, in most cases you want to have this process run as fast as possible.  Depending on the options you select and how indexes are setup on the table, the time to load the data could differ quite drastically.  In this tip we will take a look at different options for bulk loading along with examples and the time it takes for each of these operations to complete.

Solution
There are several ways to load data into SQL Server and export data from SQL Server here are a few of those previous tips.  Refer to these tips if you are not familiar with this process.


In our tests we took six different approaches for loading the data into the table. 

  1. Table has a clustered and 2 non-clustered indexes
  2. Table has only a clustered index
  3. Table has no indexes
  4. Table has no indexes and used "TABLOCK" hint
  5. Table has only a clustered index and used "TABLOCK" hint
  6. Table has a clustered and 2 non-clustered indexex

In addition, the database was set to Bulk-Logged recovery.

For these tests after each iteration the database and log were backed up to clear the log file.  Also, before each step the table was dropped and recreated.

Test 1

The first run was done using a dataset of 121,317 rows and an empty table. Before each run the table was recreated using the specifics for that test.

As you can see from the results below there is a difference depending on the process that you use to load the data.

ID Test Run(1) Run(2)
1 with clustered index and 2 non-clustered indexes 5.1 5.3
2 with clustered index only 3.2 3.0
3 with no indexes 1.4 1.4
4 with no indexes and tablock 1.2 1.3
5 with clustered index only and tablock 2.8 2.5
6 with clustered index and 2 non-clustered indexes and tablock 4.1 3.9

Test 2

The next run was done using a dataset of 242,634 rows and an empty table. Before each run the table was recreated using the specifics for that test.

Again, you can see there are quite a few differences for these runs.

ID Test Run(1) Run(2)
1 with clustered index and 2 non-clustered indexes 14.0 13.8
2 with clustered index only 6.9 7.3
3 with no indexes 2.7 2.7
4 with no indexes and tablock 2.5 2.5
5 with clustered index only and tablock 5.6 5.5
6 with clustered index and 2 non-clustered indexes and tablock 8.4 8.7

Test 3

In this test we ran two loads.  The first load was done using a dataset of 121,317 rows and an empty table and then we loaded another 121,317 records into the same table. Before each run the table was recreated using the specifics for that test.

ID Test Load(1) Load(2) Total
1 with clustered index and 2 non-clustered indexes 8.4 7.7 16.1
2 with clustered index only 3.5 3.2 6.7
3 with no indexes 1.5 1.4 2.9
4 with no indexes and tablock 1.3 1.3 2.6
5 with clustered index only and tablock 3.1 4.0 7.1
6 with clustered index and 2 non-clustered indexes and tablock 4.0 8.5 12.5

Summary

As you can see from these results and these small loads the various options do effect the time it takes the process to complete.  Therefore for faster loads you may want to load your tables without any indexes and then later create the indexes after the load.  The one step we did not do is to see how much time the index creation would take if this was done after the data load.

In addition, make sure your data and log files have sufficient space to complete the operation without having to autogrow these files.  The autogrow process can significantly effect the overall load time.

Lastly, when using the tablock option your operations will be minimally logged which is a good thing, but take a look at this previous tip, Minimally Logging Bulk Load Inserts into SQL Server, as to how this works depending on the state of the table. This is one of the reasons why in Test 3 the second load took longer to complete since there was already data in the table.

 

Next Steps

 



Last Update: 11/20/2007


About the author
MSSQLTips author Greg Robidoux
Greg Robidoux is the President of Edgewood Solutions and a co-founder of MSSQLTips.com.

View all my tips


print tip Print  
Become a paid author





join MSSQLTips for free SQL Server tips     



Learn more about SQL Server tools
Post a comment or let the author know this tip helped you.

       All comments are reviewed, so stay on subject or we may delete your comment.

*Name   *Email Notify for updates



       Note: your email address is not published. Required fields are marked with an asterisk (*)


Get free SQL tips:

*Enter Code refresh code     



Tuesday, January 05, 2010 - 12:30:52 PM - admin Read The Tip

One thing this test does not do in the tip is to build the indexes after the data load.  So you are probalby right that removing the clustered index, loading the data and then creating the clustered index does not save a lot of time.

Also, another point that should have been added is if there is a need to have a clustered index it helps having the data being loaded sorted based on the clustered key to save some processing time.

Here is another tip that references how bulk loading is performed and use of the transaction log: http://www.mssqltips.com/tip.asp?tip=1185.  This also impacts how long the process will take.

I do not have any tests that show dropping and recreating versus keeping them in place, but that would be a good test to perform.

The newsletter recycles older tips, that is why this one came up today.  The top entry is the newest tip and the other tips are pulled from the different categories of tips.


Tuesday, January 05, 2010 - 11:58:50 AM - Dwainew Read The Tip

Written in 2007, but included in a newsletter sent today.... my 2 cents:

Every time I've tested loading into a heap and then building a CI vs. loading into a CI, the difference has been negligible.  If this is a cyclical process, you also have to account for the time required to REMOVE the CI before starting the load.

 considering the time and maintenance required to maintain scripts that drop and recreate the CI, I don't think it's worth it.

 Regarding NCIs, that's a different story due to the number and potential size, but the ability to disable an index before loading saves from having to drop and recreate.

 Any hard data  on these processes would be appreciated.




 
Sponsor Information