Troubleshooting Common SQL Server Bulk Insert Errors


By:   |   Updated: 2015-05-13   |   Comments   |   Related: More > Import and Export

Problem

The following tip addresses some of the questions I get asked about using bulk insert as an ETL tool. Microsoft provides bulk insert with SQL Server and it is one of the most reliable ETL tools I've used. Combined with PowerShell, it can significantly reduce ETL development time, and unlike some alternatives, allows for easy automation and application, such as building an AI that can read a file's format, then create a bulk insert based on the structure by detecting new rows, delimiters, problematic lines, etc. At the time of this tip, bulk insert is available in SQL Server versions 2005, 2008, 2008R2, 2012, and 2014 and there's no evidence Microsoft intends to deprecate it (I've been asked about this last one).

Solution

One popular misconception about bulk insert involves inserting a document, such as a full text file. People will specify a new line character as the row separator, yet that's inaccurate if you want the document in one row, or if there is a different separator. For instance, see the below code and images for an example inserting an HTML document:

Troubleshooting Common Bulk Insert Errors

CREATE TABLE tb_HTMLDocuments(
	HTMLData VARCHAR(MAX)
)


BULK INSERT tb_HTMLDocuments
FROM 'C:\files\htmlfile.html'
WITH (
	---- Note the below space only exists to prevent the HTML reader from removing it
	ROWTERMINATOR = '< !--END-->'
)


SELECT *
FROM tb_HTMLDocuments


DROP TABLE tb_HTMLDocuments

HTML_document

If I specified a new line character as the row separator, I would have seen more rows than one. Suppose that I have several HTML documents in one text file and I want multiple rows:

multiple rows

CREATE TABLE tb_HTMLDocuments(
	HTMLData VARCHAR(MAX)
)


BULK INSERT tb_HTMLDocuments
FROM 'C:\files\htmlfile.html'
WITH (
	---- Note the below space only exists to prevent the HTML reader from removing it
	ROWTERMINATOR = '< !--END-->'
)


SELECT *
FROM tb_HTMLDocuments


DROP TABLE tb_HTMLDocuments

DROP TABLE tb_HTMLDocuments

A VARCHAR(MAX) field can hold up to 2GB, so a large document will fit into one column provided it doesn't exceed that size. This leads to a related error I get asked about a lot: Bulk load data conversion error (truncation) for row [number], column [number]. Let's generate this error:

CREATE TABLE tb_HTMLDocuments

CREATE TABLE tb_HTMLDocuments(
	HTMLData VARCHAR(1)
)


BULK INSERT tb_HTMLDocuments
FROM 'C:\files\htmlfile.html'
WITH (
	---- Note the below space only exists to prevent the HTML reader from removing it
	ROWTERMINATOR = '< !--END-->'
)


SELECT *
FROM tb_HTMLDocuments


DROP TABLE tb_HTMLDocuments

The column size must be able to hold the data size of the values being inserted. The error identifies the specific problem - the row and column. For a document with thousands of rows and hundreds of columns, you could identify what is causing trouble (more often than not, the file has an extra delimiter throwing everything out of place). Some data sources will require a line-by-line solution, like this example tip shows by removing invalid lines; it depends on the data and bulk insert makes it easy to determine where the error is.

This also highlights another tool that developers have available. Suppose we only want bulk insert to hit one error then quit. We can tell it to stop after a certain number of errors:

the_row_and_column

CREATE TABLE tb_HTMLDocuments(
	HTMLData VARCHAR(100)
)


BULK INSERT tb_HTMLDocuments
FROM 'C:\files\htmlfile.html'
WITH (
	---- Note the below space only exists to prevent the HTML reader from removing it
	ROWTERMINATOR = '< !--END-->'
	,MAXERRORS=0
)


SELECT *
FROM tb_HTMLDocuments


DROP TABLE tb_HTMLDocumentsdiv

BULK_INSERT_tb_HTMLDocuments

Since the errors (1) exceed our maximum (0), bulk insert inserts nothing. It will continue loading if it doesn't hit the error threshold, and we should be careful when specifying maximums, as in most cases we want it to throw an error and stop.

We can also skip first and last rows; I've run across various (highly accurate) data providers who love to throw in extra lines of meaningless data, which doesn't match the format and bulk insert allows us to skip rows in these cases. The below examples show this:

bulk_insert_inserts

CREATE TABLE tb_HTMLDocuments(
	HTMLData VARCHAR(MAX)
)


BULK INSERT tb_HTMLDocuments
FROM 'C:\files\htmlfile.html'
WITH (
	---- Note the below space only exists to prevent the HTML reader from removing it
	ROWTERMINATOR = '< !--END-->'
	,FIRSTROW=2
)


SELECT *
FROM tb_HTMLDocuments


DROP TABLE tb_HTMLDocuments

HTMLData_VARCHAR(MAX)

CREATE_TABLE_tb_HTMLDocuments

CREATE TABLE tb_HTMLDocuments(
	HTMLData VARCHAR(MAX)
)


BULK INSERT tb_HTMLDocuments
FROM 'C:\files\htmlfile.html'
WITH (
	---- Note the below space only exists to prevent the HTML reader from removing it
	ROWTERMINATOR = '< !--END-->'
	,LASTROW=2
)


SELECT *
FROM tb_HTMLDocuments


DROP TABLE tb_HTMLDocuments

WITH_TABLOCK

Developers should consider the option WITH TABLOCK, which can be useful on a first load, as it locks the table during the bulk insert load (though, this may be less important on later loads if we're wanting to load multiple files into the same table). For some file formats, we can also use the option FORMATFILE to specify the format file and bulk load data from a file, matching its format files pattern, into the table. In all cases, for performance enhancements, I'd highly suggest looking into removing indexes before a load and adding them after a load, unless the situation requires indexes. And with PowerShell, the more work you do upfront on data scrubbing before loading the data, the more you can skip steps, such as loading data into a staging or temporary table.

Next Steps
  • For even more possible commands, read the documentation on bulk insert provided by Microsoft.
  • Test generating errors are common issues and see how you write code to automate handling those errors; usually, they're all derivatives of the same problem.


Last Updated: 2015-05-13


get scripts

next tip button



About the author
MSSQLTips author Tim Smith Tim Smith works as a DBA and developer and also teaches Automating ETL on Udemy.

View all my tips




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name
*Email
Email me updates

Signup for our newsletter

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.






download

























get free sql tips

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



Learn more about SQL Server tools