Making data imports into SQL Server as fast as possible

By: Greg Robidoux | Comments (2) | Related: > Integration Services Performance

Problem

When bulk loading data into SQL Server, in most cases you want to have this process run as fast as possible. Depending on the options you select and how indexes are setup on the table, the time to load the data could differ quite drastically. In this tip we will take a look at different options for bulk loading along with examples and the time it takes for each of these operations to complete.

Solution

There are several ways to load data into SQL Server and export data from SQL Server here are a few of those previous tips. Refer to these tips if you are not familiar with this process.

In our tests we took six different approaches for loading the data into the table.

Table has a clustered and 2 non-clustered indexes
Table has only a clustered index
Table has no indexes
Table has no indexes and used "TABLOCK" hint
Table has only a clustered index and used "TABLOCK" hint
Table has a clustered and 2 non-clustered indexex

In addition, the database was set to Bulk-Logged recovery.

For these tests after each iteration the database and log were backed up to clear the log file. Also, before each step the table was dropped and recreated.

Test 1

The first run was done using a dataset of 121,317 rows and an empty table. Before each run the table was recreated using the specifics for that test.

As you can see from the results below there is a difference depending on the process that you use to load the data.

ID	Test	Run(1)	Run(2)
1	with clustered index and 2 non-clustered indexes	5.1	5.3
2	with clustered index only	3.2	3.0
3	with no indexes	1.4	1.4
4	with no indexes and tablock	1.2	1.3
5	with clustered index only and tablock	2.8	2.5
6	with clustered index and 2 non-clustered indexes and tablock	4.1	3.9

Test 2

The next run was done using a dataset of 242,634 rows and an empty table. Before each run the table was recreated using the specifics for that test.

Again, you can see there are quite a few differences for these runs.

ID	Test	Run(1)	Run(2)
1	with clustered index and 2 non-clustered indexes	14.0	13.8
2	with clustered index only	6.9	7.3
3	with no indexes	2.7	2.7
4	with no indexes and tablock	2.5	2.5
5	with clustered index only and tablock	5.6	5.5
6	with clustered index and 2 non-clustered indexes and tablock	8.4	8.7

Test 3

In this test we ran two loads. The first load was done using a dataset of 121,317 rows and an empty table and then we loaded another 121,317 records into the same table. Before each run the table was recreated using the specifics for that test.

ID	Test	Load(1)	Load(2)	Total
1	with clustered index and 2 non-clustered indexes	8.4	7.7	16.1
2	with clustered index only	3.5	3.2	6.7
3	with no indexes	1.5	1.4	2.9
4	with no indexes and tablock	1.3	1.3	2.6
5	with clustered index only and tablock	3.1	4.0	7.1
6	with clustered index and 2 non-clustered indexes and tablock	4.0	8.5	12.5

Summary

As you can see from these results and these small loads the various options do effect the time it takes the process to complete. Therefore for faster loads you may want to load your tables without any indexes and then later create the indexes after the load. The one step we did not do is to see how much time the index creation would take if this was done after the data load.

In addition, make sure your data and log files have sufficient space to complete the operation without having to autogrow these files. The autogrow process can significantly effect the overall load time.

Lastly, when using the tablock option your operations will be minimally logged which is a good thing, but take a look at this previous tip, Minimally Logging Bulk Load Inserts into SQL Server, as to how this works depending on the state of the table. This is one of the reasons why in Test 3 the second load took longer to complete since there was already data in the table.

Next Steps

Next time you need to import data into SQL Server keep this tip in mind as to other options you may have to make the process run faster.
Take a look at these other tips:

About the author

Greg Robidoux is the President and founder of Edgewood Solutions, a technology services company delivering services and solutions for Microsoft SQL Server. He is also one of the co-founders of MSSQLTips.com. Greg has been working with SQL Server since 1999, has authored numerous database-related articles, and delivered several presentations related to SQL Server. Before SQL Server, he worked on many data platforms such as DB2, Oracle, Sybase, and Informix.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Comments For This Article

Tuesday, January 5, 2010 - 12:30:52 PM - admin

One thing this test does not do in the tip is to build the indexes after the data load. So you are probalby right that removing the clustered index, loading the data and then creating the clustered index does not save a lot of time.

Also, another point that should have been added is if there is a need to have a clustered index it helps having the data being loaded sorted based on the clustered key to save some processing time.

Here is another tip that references how bulk loading is performed and use of the transaction log: http://www.mssqltips.com/tip.asp?tip=1185. This also impacts how long the process will take.

I do not have any tests that show dropping and recreating versus keeping them in place, but that would be a good test to perform.

The newsletter recycles older tips, that is why this one came up today. The top entry is the newest tip and the other tips are pulled from the different categories of tips.

Tuesday, January 5, 2010 - 11:58:50 AM - Dwainew

Written in 2007, but included in a newsletter sent today.... my 2 cents:

Every time I've tested loading into a heap and then building a CI vs. loading into a CI, the difference has been negligible. If this is a cyclical process, you also have to account for the time required to REMOVE the CI before starting the load.

considering the time and maintenance required to maintain scripts that drop and recreate the CI, I don't think it's worth it.

Regarding NCIs, that's a different story due to the number and potential size, but the ability to disable an index before loading saves from having to drop and recreate.

Any hard data on these processes would be appreciated.