Using a Simple SQL Server Bulk Insert to View and Validate Data

By:   |   Comments (9)   |   Related: More > Import and Export


Problem

My company uses a vendor-supplied application for loading data into a specialized SQL Server OLTP database that has a matching highly integrated enterprise application. The input application has a large set of data layouts for inputting different kinds of records into the OLTP database. The wide range and sometimes complex layout for inputting text files makes it difficult for our company to track down the source of errors, such as data issues, bad data formats, or even faulty application configuration settings. Are there any simple SQL Server T-SQL capabilities for validation of text files, such as those required by our input application?

Solution

The BULK INSERT statement is especially designed for inputting bulk data sources, such as large text files, into a SQL Server table. After the contents of the text file are in a table, you have the full power of T-SQL available for validating the contents of the text file that is copied into the table. This tip illustrates with a pair of easy-to-follow examples how to take advantage of the BULK INSERT statement for inputting text files into a SQL Server table and then using T-SQL to validate the input from one or more text files that are imported to SQL Server.

The BULK INSERT statement is a moderately rich T-SQL statement with an array of parameters to assist you with configuring its functionality. However, the statement's default settings are appropriate without any tweaking for inputting a text file. Therefore, after specifying a database to hold a text file's contents, all you have to do is indicate the name and path to the file to be imported to a target table in SQL Server.

Steps for inputting a text file to a SQL Server table

With three easy steps, you can import a text file into a SQL Server instance via the BULK insert command.

  • First, designate the database holding the target table to be populated with the contents of the text file.
  • Second, create a fresh copy of the table to receive the file's contents.
  • Third, invoke the BULK INSERT statement so that it points at text file (C:\foo\SampleTextFile.txt) as its input and the target table as its output.

The following short script illustrates the application of these three steps. The USE statement designates the MSSQLTips database as the container for the target table to hold the text file's contents.

There are at least a couple of different strategies to prepare a target table for holding the text file's contents. In the following script, an EXISTS operator tests for the existence of a previously created target table (the ImportedFileTable table in the default schema, such as dbo, of the MSSQLTips database). If the object exists, the table is dropped so that a fresh copy can be created by the following CREATE TABLE statement. The target table in this example has a single column up to the maximum width for a varchar column. This approach of stuffing each line from a text file into a single column is especially appropriate when the location of field values are positionally specified within the text file. Non-default BULK INSERT configuration settings can accommodate requirements when input column values are delimited by a special character, such as a comma (,) or a pipe (|).

USE MSSQLTips
GO

-- Remove prior ImportedFileTable
if EXISTS (SELECT * FROM  sys.objects WHERE name  = 'ImportedFileTable' and TYPE =  'u')
    DROP TABLE  ImportedFileTable

-- Create ImportedFileTable
CREATE TABLE ImportedFileTable(
textvalue varchar(max)
)
GO

-- Import text file
BULK INSERT ImportedFileTable
FROM 'C:\foo\SampleTextFile.txt'
 

Validating a Text File in a SQL Server Table

After you get your text file into a SQL Server table, then you can use queries to verify the contents within the table. The queries that you run depend on the specifications for the input file and type of test that your organization thinks is most appropriate.

The following pair of scripts illustrates two very simple data profiling scripts for the contents of a text file in the ImportedFileTable. The first query lists the rows in the table. Whether or not you display all the rows in a table will likely depend on the number of rows in a table. However, it is often helpful to see some subset of rows when attempting to validate data.

The second query counts the rows based on the first two characters of a row. In the context of the sample application, these first two characters denote the record type. For example, 00 is for a header row and 99 is for a trailer row.

-- List file contents
SELECT * FROM ImportedFileTable

-- Report count of record types
SELECT
  LEFT(textvalue,2), COUNT(*) record_type_count
FROM ImportedFileTable
GROUP BY LEFT(textvalue,2)
ORDER BY LEFT(textvalue,2)
 

The following screen shot displays the two result sets output by the preceding script. Notice from the first result set that each line from the input file starts with a two-character number. This number designates the type of record type: 01 is for a record with information about a debtor, 31 is for information about a specific debt, 35 is for information about amounts associated with a debt, and 37 references the owners for a debt when there is more than one owner of a debt.

The second result set shows the count for each record type. It is common to confirm transfers by verifying that the number of output records matches the number of input records.

Validating a Text File in a SQL Server Table

Validating a Change to a Text File in a SQL Server Table

Let's say that the creator of the input file incorrectly specified a 37 record for account number 123. This situation could be fixed by creating a new input file. The new input file will be without a 37 record for account number 123. Otherwise, the two files will be identical. By a relatively simple extension of the approach for inputting the first file, you can load the corrected file and even verify that it satisfies the requirement of no 37 record.

The following code excerpt validates the change to the text file. The script starts by preparing a second table for the second corrected text file. Then, it uses the BULK INSERT statement to load the file (C:\foo\SampleTextFile_wo_37.txt) into the table (ImportedFileTable_wo_37).

Next, an EXCEPT operator sandwiched between two SELECT statements returns all rows from the original file that are missing from the second corrected file. The screen shot after the following script verifies that there is just one record in the original file which is missing from the second file. This missing record is the erroneous 37 record for account number 123.

-- Drop and re-created wo_37 file
if EXISTS (SELECT * FROM  sys.objects WHERE name  = 'ImportedFileTable_wo_37' and TYPE =  'u')
    DROP TABLE  ImportedFileTable_wo_37

-- Create a table for the second text file
CREATE TABLE ImportedFileTable_wo_37(
textvalue varchar(max)
)
GO

-- Import second, corrected file
BULK INSERT ImportedFileTable_wo_37
FROM 'C:\foo\SampleTextFile_wo_37.txt'


-- Rows in SampleTextFile.txt missing from SampleTextFile_wo_37.txt
SELECT * FROM ImportedFileTable

EXCEPT

SELECT * FROM ImportedFileTable_wo_37
 


The BULK INSERT statement is both a simple and powerful tool for importing text files into a SQL Server database.
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Rick Dobson Rick Dobson is an author and an individual trader. He is also a SQL Server professional with decades of T-SQL experience that includes authoring books, running a national seminar practice, working for businesses on finance and healthcare development projects, and serving as a regular contributor to MSSQLTips.com. He has been growing his Python skills for more than the past half decade -- especially for data visualization and ETL tasks with JSON and CSV files. His most recent professional passions include financial time series data and analyses, AI models, and statistics. He believes the proper application of these skills can help traders and investors to make more profitable decisions.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Tuesday, December 5, 2017 - 8:48:00 AM - Rick Dobson Back To Top (73641)

Interesting.

It is my understanding that you have multiple files with the same names but different content.  If my understanding is correct, then I would start my etl process by assigning a unique name to each file.  You may be able to do this as simply as assigning an arrival datetime for each file.  Additionally, you may be able to get your file providers to name their files differently, such as file_from_provider_x_on_year_month_day.txt and file_from_provider_y_on_year_month_day.txt.  At the very least, you should be able to append a suffix to the file names with the date.  SSIS readily enables this kind of functionality.  Try to have your file naming rules indicate the types of columns in a file.  If you are able to achieve this, then you will know what columns are in a file by its name.

If you are not able to get file providers to name files in a way that helps you identify the columns in a file, then consider requesting file providers to include a header row in each file with the names of the columns in the file.  Besides column names, you may need some information that allows you to assess if the data in a column is valid (for example, a date value of February 30,2017 is not a valid value or a column of numeric values must have numbers in it -- not letters).

I hope this feedback moves you in a useful direction towards the resolution of your problem.

 

Rick Dobson

 

 

 

 


Tuesday, December 5, 2017 - 1:14:37 AM - Tawanda Mkutuma Back To Top (73626)

 

 Hi Rob

Thank you for the detailed response. Let me reilterate my question: I have multiple flat files with more than 200 columns each and I need to automate the etl to  sql tables. Wha tI mean by inconsistence columns was that the columns changes, these flat files do not come with fixed columns its either some columns are dropped or more columns are added but the file name still remains the same. My challenge is how do I create an efficient etl which will dynamically load these flat files. What are the other options would advise.

 

Thank you

 

Tawanda

 

 


Monday, December 4, 2017 - 1:00:12 PM - Rick Dobson Back To Top (73610)

Hi Tawanda Mkutuma,

 I do not understand what you mean by "flat file with inconsistent columns.

In any event, my tip was about reading a file with a single column of string values and then parsing the string values into individual columns after the BULK INSERT transfers them to a staging table.

You say that your files have over 200 columns.  If this means the column values have delimiters, then you may find the BULK INSERT approach less efficient than other approaches that take advantage of the delimiters when transferring the column values from your files to SQL Server table.

I hope this helps you.

Rick Dobson

 

 

 

 


Monday, December 4, 2017 - 4:07:27 AM - Tawanda Mkutuma Back To Top (73600)

Hi Rick

I am struggling with bulk insert from flat file with inconcistent columns ito sql table. My flat filenhas more than 200 columns. I would like to create a dynamic way of automating the uploads into sql tables where a user can specify which file to run per given time.

 

 


Monday, March 23, 2015 - 10:08:22 PM - Rick Dobson Back To Top (36647)

 

Thanks for giving the tip a try.  I am sorry that it did not work with SQL Azure.  I do not work with SQL Azure.

 

This tip was developed and tested in a secure access zone on a private network accessed via a vpn.  If you are able to work in that kind of environment, I am optimistic that the technique will work well for you.

 


Monday, March 23, 2015 - 3:48:42 PM - Mike OMalley Back To Top (36642)

Hi Rick,

  I've had problems using Bulk insert on our new Azure SqlServer Data base. Have you experienced this ?  are there any work arounds that you are aware of ?

 

 Thanks,

       Mike

 


Friday, January 16, 2015 - 9:33:14 AM - Rick Dobson Back To Top (35966)

Have you tried using the Import/Export Wizard?  It has dialog boxes and is relatively easy and self-explanatory to use.  Sometimes a little tweaking may be required, but it works right a lot without any tweaking.


Monday, January 12, 2015 - 12:52:28 PM - Morajaa ALMalky Back To Top (35905)

Thanks for these usefull info

Please I need how to import/Export from/To  Excel2007 to mssql Table


Friday, January 9, 2015 - 11:01:03 AM - rick willemain Back To Top (35883)

Thank you. Very nice summary review !!!















get free sql tips
agree to terms