Learn more about SQL Server tools

   
   






























































Latest from MSSQLTips

Find Tables, Columns or Stored Procedures used in a SQL Server Reporting Services Report

Dynamically Refresh SQL Server Reporting Services Report Date Parameters

SQL Server 2016 T-SQL Syntax to Query Temporal Tables

Troubleshoot SQL Azure Bacpac Import Failures

On-Demand Webcast - Monitoring SQL Server with Mobile Devices

Configure the SQL Server Integration Services For Loop Container

On-Demand Webcast - Performance Monitoring with Uptime

On-Demand Webcast - Physical and Virtual Performance Monitoring with SQL Diagnostic Manager

On-Demand Webcast - Encrypting Data with SQL Server

On-Demand Webcast - Performance monitoring with Spotlight on SQL Server

Whitepaper - Resolving the Database Performance Blame Game

3 Tips for Managing Large Numbers of SQL Server Jobs

Accelerate SQL Server with Flash Storage














Making data imports into SQL Server as fast as possible

MSSQLTips author Greg Robidoux By:   |   Read Comments (2)   |   Related Tips: More > Integration Services Performance



>>> >> > Vote for your favorite MSSQLTips.com Authors < << <<<


Problem
When bulk loading data into SQL Server, in most cases you want to have this process run as fast as possible.  Depending on the options you select and how indexes are setup on the table, the time to load the data could differ quite drastically.  In this tip we will take a look at different options for bulk loading along with examples and the time it takes for each of these operations to complete.

Solution
There are several ways to load data into SQL Server and export data from SQL Server here are a few of those previous tips.  Refer to these tips if you are not familiar with this process.


In our tests we took six different approaches for loading the data into the table. 

  1. Table has a clustered and 2 non-clustered indexes
  2. Table has only a clustered index
  3. Table has no indexes
  4. Table has no indexes and used "TABLOCK" hint
  5. Table has only a clustered index and used "TABLOCK" hint
  6. Table has a clustered and 2 non-clustered indexex

In addition, the database was set to Bulk-Logged recovery.

For these tests after each iteration the database and log were backed up to clear the log file.  Also, before each step the table was dropped and recreated.

Test 1

The first run was done using a dataset of 121,317 rows and an empty table. Before each run the table was recreated using the specifics for that test.

As you can see from the results below there is a difference depending on the process that you use to load the data.

ID Test Run(1) Run(2)
1 with clustered index and 2 non-clustered indexes 5.1 5.3
2 with clustered index only 3.2 3.0
3 with no indexes 1.4 1.4
4 with no indexes and tablock 1.2 1.3
5 with clustered index only and tablock 2.8 2.5
6 with clustered index and 2 non-clustered indexes and tablock 4.1 3.9

Test 2

The next run was done using a dataset of 242,634 rows and an empty table. Before each run the table was recreated using the specifics for that test.

Again, you can see there are quite a few differences for these runs.

ID Test Run(1) Run(2)
1 with clustered index and 2 non-clustered indexes 14.0 13.8
2 with clustered index only 6.9 7.3
3 with no indexes 2.7 2.7
4 with no indexes and tablock 2.5 2.5
5 with clustered index only and tablock 5.6 5.5
6 with clustered index and 2 non-clustered indexes and tablock 8.4 8.7

Test 3

In this test we ran two loads.  The first load was done using a dataset of 121,317 rows and an empty table and then we loaded another 121,317 records into the same table. Before each run the table was recreated using the specifics for that test.

ID Test Load(1) Load(2) Total
1 with clustered index and 2 non-clustered indexes 8.4 7.7 16.1
2 with clustered index only 3.5 3.2 6.7
3 with no indexes 1.5 1.4 2.9
4 with no indexes and tablock 1.3 1.3 2.6
5 with clustered index only and tablock 3.1 4.0 7.1
6 with clustered index and 2 non-clustered indexes and tablock 4.0 8.5 12.5

Summary

As you can see from these results and these small loads the various options do effect the time it takes the process to complete.  Therefore for faster loads you may want to load your tables without any indexes and then later create the indexes after the load.  The one step we did not do is to see how much time the index creation would take if this was done after the data load.

In addition, make sure your data and log files have sufficient space to complete the operation without having to autogrow these files.  The autogrow process can significantly effect the overall load time.

Lastly, when using the tablock option your operations will be minimally logged which is a good thing, but take a look at this previous tip, Minimally Logging Bulk Load Inserts into SQL Server, as to how this works depending on the state of the table. This is one of the reasons why in Test 3 the second load took longer to complete since there was already data in the table.

 

Next Steps

 



Last Update: 11/20/2007


About the author
MSSQLTips author Greg Robidoux
Greg Robidoux is the President of Edgewood Solutions and a co-founder of MSSQLTips.com.

View all my tips


print tip Print  
Become a paid author





join MSSQLTips for free SQL Server tips     



Learn more about SQL Server tools


Post a comment or let the author know this tip helped you.

       All comments are reviewed, so stay on subject or we may delete your comment.

*Name   *Email Notify for updates



       Note: your email address is not published. Required fields are marked with an asterisk (*)


Get free SQL tips:

*Enter Code refresh code     



Tuesday, January 05, 2010 - 12:30:52 PM - admin Read The Tip

One thing this test does not do in the tip is to build the indexes after the data load.  So you are probalby right that removing the clustered index, loading the data and then creating the clustered index does not save a lot of time.

Also, another point that should have been added is if there is a need to have a clustered index it helps having the data being loaded sorted based on the clustered key to save some processing time.

Here is another tip that references how bulk loading is performed and use of the transaction log: http://www.mssqltips.com/tip.asp?tip=1185.  This also impacts how long the process will take.

I do not have any tests that show dropping and recreating versus keeping them in place, but that would be a good test to perform.

The newsletter recycles older tips, that is why this one came up today.  The top entry is the newest tip and the other tips are pulled from the different categories of tips.


Tuesday, January 05, 2010 - 11:58:50 AM - Dwainew Read The Tip

Written in 2007, but included in a newsletter sent today.... my 2 cents:

Every time I've tested loading into a heap and then building a CI vs. loading into a CI, the difference has been negligible.  If this is a cyclical process, you also have to account for the time required to REMOVE the CI before starting the load.

 considering the time and maintenance required to maintain scripts that drop and recreate the CI, I don't think it's worth it.

 Regarding NCIs, that's a different story due to the number and potential size, but the ability to disable an index before loading saves from having to drop and recreate.

 Any hard data  on these processes would be appreciated.




 
Sponsor Information







 

Follow

Get Free SQL Tips

Twitter

LinkedIn

Google+

Facebook

Pinterest

RSS

Learning

DBAs

Developers

BI Professionals

Careers

Q and A

Today's Tip

Resources

Tutorials

Webcasts

Whitepapers

Tools

Search

Tip Categories

Search By TipID

Authors

Community

First Timer?

Pictures

Free T-shirt

Contribute

Events

User Groups

Author of the Year

More Info

Join

About

Copyright

Privacy

Disclaimer

Feedback

Advertise

Copyright (c) 2006-2015 Edgewood Solutions, LLC All rights reserved
Some names and products listed are the registered trademarks of their respective owners.