Using OPENROWSET to read large files into SQL Server

By:   |   Comments (23)   |   Related: More > Import and Export


Problem

OPENROWSET is a T-SQL function that allows for reading data from many sources including using the SQL Server's BULK import capability. One of the useful features of the BULK provider is its ability to read individual files from the file system into SQL Server, such as loading a data from a text file or a Word document into a SQL Server table. This capability is the subject of this tip.

Solution

The BULK option was added to T-SQL in SQL Server 2005.

When used with the BULK provider keyword you can name a data file to read as one of three types of objects:

  • SINGLE_BLOB, which reads a file as varbinary(max)
  • SINGLE_CLOB, which reads a file as varchar(max)
  • SINGLE_NCLOB, which reads a file as nvarchar(max)

OPENROWSET returns a single column, named BulkColumn, as its result. Here's an example that reads a text file:

SELECT BulkColumn 
FROM OPENROWSET (BULK 'c:\temp\mytxtfile.txt', SINGLE_CLOB) MyFile 

The correlation name, in this case MyFile, is required by OPENROWSET.

There are additional requirements when reading single files that must also be observed as mentioned below.

Access control is always a concern. The operating system level file operations to read the file are executed with the privileges of the account that the SQL Server data engine is using. Therefore, only files accessible to that account may be read. This includes network drives or UNC paths, which are permitted if the account has the privileges. If you want to read network files, run SQL Server as a domain user.

The BULK provider won't convert between Unicode and plain ASCII files. It must be told which type of encoding is used in the file. If you don't the result is error 4806 as seen here:

SELECT BulkColumn 
FROM OPENROWSET (BULK 'c:\temp\SampleUnicode.txt', SINGLE_CLOB) MyFile 
Msg 4806, Level 16, State 1, Line 1
SINGLE_CLOB requires a double-byte character set (DBCS) (char) input file. The file specified is Unicode.

Unicode files must be read with the SINGLE_NCLOB option shown here:

SELECT BulkColumn 
FROM OPENROWSET (BULK 'c:\temp\SampleUnicode.txt', SINGLE_NCLOB) MyFile 

Similarly, files with non text structures, such as Word documents are not converted. They must be converted by some other mechanism before being read or they can be read as binary files with the SINGLE_BLOB option.

OPENROWSET isn't flexible about how you provide the name of the file. It must be a string constant. That requirement forces the use of dynamic SQL when the file name isn't known in advance.

Here's a stored procedure that reads any text file and returns the contents as an output variable:

SET ANSI_NULLS ON 
GO 
SET QUOTED_IDENTIFIER ON 
GO 

CREATE PROC [dbo].[ns_txt_file_read]  
    @os_file_name NVARCHAR(256) 
   ,@text_file VARCHAR(MAX) OUTPUT  
/* Reads a text file into @text_file 
* 
* Transactions: may be in a transaction but is not affected 
* by the transaction. 
* 
* Error Handling: Errors are not trapped and are thrown to 
* the caller. 
* 
* Example: 
    declare @t varchar(max) 
    exec ns_txt_file_read 'c:\temp\SampleTextDoc.txt', @t output 
    select @t as [SampleTextDoc.txt] 
* 
* History: 
* WHEN       WHO        WHAT 
* ---------- ---------- --------------------------------------- 
* 2007-02-06 anovick    Initial coding 
**************************************************************/  
AS  
DECLARE @sql NVARCHAR(MAX) 
      , @parmsdeclare NVARCHAR(4000)  

SET NOCOUNT ON  

SET @sql = 'select @text_file=(select * from openrowset ( 
           bulk ''' + @os_file_name + ''' 
           ,SINGLE_CLOB) x 
           )' 

SET @parmsdeclare = '@text_file varchar(max) OUTPUT'  

EXEC sp_executesql @stmt = @sql 
                 , @params = @parmsdeclare 
                 , @text_file = @text_file OUTPUT 

To see how it works, just execute the example script: First create a text file called "SampleTextDoc.txt" and add some text data to the file. For our example we added the following text "The quick brown fox jumped over the lazy dog.".

DECLARE @t VARCHAR(MAX) 
EXEC ns_txt_file_read 'c:\temp\SampleTextDoc.txt', @t output 
SELECT @t AS [SampleTextDoc.txt]  

The results are:

SampleTextDoc.txt
The quick brown fox jumped over the lazy dog. 
(1 row(s) affected)

The performance of reading text files is remarkably fast because the files are read sequentially. Using a 64 bit SQL Server on a development machine, reading a file of 750,000,000 bytes took only 7 seconds.

Next Steps
  • If there is a need to bulk insert large text files or binary objects into SQL Server look at using OPENROWSET


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Andy Novick Andy Novick is a SQL Server Developer in the Boston area with 25 years of database and application development experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Wednesday, July 29, 2020 - 10:25:45 AM - Greg Robidoux Back To Top (86210)

Hi Woody,

What are you trying to do?  Are you trying to read the SQL Server error logs or some other log files?

-Greg


Tuesday, July 28, 2020 - 8:46:08 PM - Woody Back To Top (86207)

Thanks for this post.

I've run this on a log file but I only get 1 row/column of data. Is this the intended result? If so, this solution is not really suited for my intention of querying the log file for errors. Can anyone suggest a solution?

thanks.


Tuesday, May 19, 2020 - 12:07:45 PM - Luis Alzate Back To Top (85698)

Excellent, I made the adjustments to my code and it works perfectly. Thanks a lot!!!!


Tuesday, October 11, 2016 - 1:21:17 PM - Cottage Back To Top (43538)

Hello,

Can u please tell me if we can import more than 256 columns using this method? 

 


Thursday, February 18, 2016 - 10:13:26 PM - Miguel Back To Top (40725)

 Thank you , great post. 

 


Saturday, October 11, 2014 - 3:20:51 AM - Ricky Jairath Back To Top (34930)

 

I need insert my data nfrom anb excel file into my database but my tables in database are dynamic .

Wednesday, December 11, 2013 - 5:14:56 AM - sonali Back To Top (27759)

Hi

    Thanks for the article

I need to read content of large files up to 50 mb and these are ai files or can be of nay other type , when using openrowset , it reads the file but truncates it in middle

Is there any way to read the file fully and then insert it into database , I have both varchar(max) and varbinary(max) columns in my table

Thanks a lot in advance

sonali

 


Friday, June 7, 2013 - 11:23:24 AM - VincentVega Back To Top (25337)

Hi! And Thanks alot for this article!

But now i have another Problem. I have Files with Tables like csv, xls and till now i use this to select the content (txt example):

SELECT * FROM OPENROWSET('MSDASQL','Driver={Microsoft Text Driver (*.txt; *.csv)};DEFAULTDIR=E:\;Extensions=CSV;','SELECT * FROM c:\_test.txt')

Is it possible to do something like that directly with a blob field (filestream), or do i have to get the guid and type of the filestream i want to select and built a statement like that (txt example):

SELECT * FROM OPENROWSET('MSDASQL','Driver={Microsoft Text Driver (*.txt; *.csv)};DEFAULTDIR=E:\;Extensions=CSV;','SELECT * FROM c:\myfilestreams\0000001d-00000143-0002')

Unfortunely that doesn't work, too ;(  -- I think because security issues?!

Any help welcome.


Thursday, March 28, 2013 - 4:38:07 PM - Suman Back To Top (23062)

 

Thanks this post is so helpful to me


Thursday, October 11, 2012 - 8:17:20 AM - SPRAO Back To Top (19874)

Hi

I have an Classoc ASP application which uploads one file at a time to a column on the SQL Server table. 

I am using the following command and it is working fine.

Server.CreateObject("ADODB.Stream")

INSERT INTO MY_TABLE (FILE_NAME, REPORT_FILE) VALUES(?, ?)

ADODB.Command object is to build the query.

ADODB.Stream object is used to read the file from the hard disk and assign to the parameter

StatementADO.Parameters(1) = ReportFile.Read

The code is working fine. I have to load only one file when user interacts with the application.

Do I still need to use OPENROWSET ?

 

Friday, July 20, 2012 - 8:55:16 AM - Ramakrishna Kappagantula Back To Top (18705)

Is there any limitation on the number of columns that can handle using OPENROWSET? If so, could you please let us know the maximum number of columns that can be handled. For example, I have a .CSV file which contains data (i.e., questionaire) with 3500 columns in it. Just wondering whether it can support that many columns. If so, I just want to do an unpivot all the questions. Please suggest?


Monday, June 25, 2012 - 8:41:43 AM - Heba Back To Top (18189)

Thanks this post is so helpful to me

u said 

If there is a need to bulk insert large text files or binary objects into SQL Server 2005 or 2008 look at using OPENROWSET

i searched a lot on how to read large files but i didn't found any helpful issue.

please could u help me?



Tuesday, May 29, 2012 - 8:41:23 PM - ricardo Back To Top (17712)

SELECT  *
FROM OPENROWSET
(    BULK 'pathfile',
    SINGLE_CLOB    
) AS A


Wednesday, December 31, 2008 - 1:50:33 PM - [email protected] Back To Top (2485)

Maybe as I learn more about the forum, customs/habits etc. I joined today, only because I was reading the subject article, and felt that I had to add my experience(s). I have been doing this with Sybase, MS SQL Server quite a few years (longer than one would like to admit).

Thanks


Wednesday, December 31, 2008 - 1:37:47 PM - aprato Back To Top (2484)

 I suspected it would but I've never done a formal test.  You should write a tip on your findings.  It would be really enlightening.


Wednesday, December 31, 2008 - 1:32:52 PM - [email protected] Back To Top (2483)

I tried it with and without BULK. This was actually done in an attempt to eleminate DTS packages and go to straight SQL Code. Unfortunately some of my files contain 50M+ records and are loaded nightly. DTS and BCP always won hands down.


Wednesday, December 31, 2008 - 12:43:54 PM - aprato Back To Top (2482)

 sjimmo...did your test usethe OPENROWSET(BULK…) syntax?  Just curious.


Wednesday, December 31, 2008 - 12:08:40 PM - snakyjake Back To Top (2481)

Appreciate the reply.

I like the explanation on why OpenRowset is thought to be slower -- loading into tempdb.

Does BCP load into tempdb too?  If so, then the primary advantage of BCP is the easy use of batching.


 


Wednesday, December 31, 2008 - 9:08:27 AM - [email protected] Back To Top (2479)

In my tests, using a text file with several fields/columns rows, I found:

Using OpenRowset was slower on files with many (500000+ records) than BCP

Using OpenRowSet read the records enmass to the tempdb and then applied them to the database table. BCP/DTS and SSIS allows me to control the batch size which then allowed those records with no errorr to be applied. Using the error file, I can find where the records were with problems fix them and then reapply them. I could not do this with OpenRowSet.

I maty not be using all of the functionality of OpenRowSet. But that is what I have found in the past few years. Thus, on very large files, I use either DTS, SSIS or BCP. On smaller files, I use openrowset and can control it within a transaction batch.


Monday, December 29, 2008 - 10:13:41 AM - aprato Back To Top (2469)

http://msdn.microsoft.com/en-us/library/ms187042(SQL.90).aspx

Consider SQL Server Replication.  It needs to transfer the data from the log on the source db to the destination db.  How does it accomplish this?  Under the covers it uses bcp.   If that's the case, then my suspicion is that bcp may be the most performant way.  Your own tests on large sets of data can verify this.


Monday, December 29, 2008 - 9:47:30 AM - snakyjake Back To Top (2468)

Aprato,

Thank you for the reply.  In the article it says I can use OPENROWSET to read a file.  I assume the file is streamed, and therefore the same performance as BCP.

Just as confusing, there are multiple ways to do ETL loads, and I'm not sure what the performance advantages/disadvantages are of each option (assuming that source and target are both SQL Server). I imagine someone has already done the performance tests, or knows how they work, and can help provide the information.

Thank you,

Jake


Monday, December 29, 2008 - 8:39:05 AM - aprato Back To Top (2467)

BCP is optimized for pumping a continuous stream of data

As for differences 

OPENROWSET is used to execute a query against an OLEDB provider. You're providing connection details as well as the query you wish to run.

OPENDATASOURCE returns a reference to a data source which can be used in
4 part naming for those situations where you don't have a linked server set up. It tells how to connect to a server.

 


Sunday, December 28, 2008 - 11:23:15 PM - snakyjake Back To Top (2464)

What are the differences between using BCP and OPENROWSET?  Why use OPENROWSET over BCP?

For transferring data between two SQL Servers, why not use OPENDATASOURCE? 

Or why not use Replication?

Thank you,

Jake















get free sql tips
agree to terms