solving sql server problems for millions of dbas and developers since 2006
join MSSQLTips for free SQL Server tips

































Top SQL Server Tools






















Using OPENROWSET to read large files into SQL Server

MSSQLTips author Andy Novick By:   |   Read Comments (17)   |   Related Tips: More > Import and Export

Problem
OPENROWSET is a T-SQL function that allows for reading data from many sources including using the SQL Server's BULK import capability. One of the useful features of the BULK provider is its ability to read individual files from the file system into SQL Server, such as loading a data from a text file or a Word document into a SQL Server table. This capability is the subject of this tip.

Solution
The BULK option was added to T-SQL in SQL Server 2005 and it persists in SQL Server 2008. When using SQL Server 2000 it was possible to read and write to the file system using the sp_OA_Create and sp_OA_Method extended stored procedures. These XPs continue to work, but are disabled by default because of security concerns and it is a better practice to use more secure capabilities like OPENROWSET when they are available.

When used with the BULK provider keyword you can name a data file to read as one of three types of objects:

  • SINGLE_BLOB, which reads a file as varbinary(max)
  • SINGLE_CLOB, which reads a file as varchar(max)
  • SINGLE_NCLOB, which reads a file as nvarchar(max)

OPENROWSET returns a single column, named BulkColumn, as its result. Here's an example that reads a text file:

SELECT BulkColumn
FROM OPENROWSET (BULK 'c:\temp\mytxtfile.txt', SINGLE_CLOB) MyFile

The correlation name, in this case MyFile, is required by OPENROWSET.

There are additional requirements when reading single files that must also be observed as mentioned below.

Access control is always a concern. The operating system level file operations to read the file are executed with the privileges of the account that the SQL Server data engine is using. Therefore, only files accessible to that account may be read. This includes network drives or UNC paths, which are permitted if the account has the privileges. If you want to read network files, run SQL Server as a domain user.

The BULK provider won't convert between Unicode and plain ASCII files. It must be told which type of encoding is used in the file. If you don't the result is error 4806 as seen here:

SELECT BulkColumn
FROM OPENROWSET (BULK 'c:\temp\SampleUnicode.txt', SINGLE_CLOB) MyFile
Msg 4806, Level 16, State 1, Line 1
SINGLE_CLOB requires a double-byte character set (DBCS) (char) input file. The file specified is Unicode.

Unicode files must be read with the SINGLE_NCLOB option shown here:

SELECT BulkColumn
FROM OPENROWSET (BULK 'c:\temp\SampleUnicode.txt', SINGLE_NCLOB) MyFile

Similarly, files with non text structures, such as Word documents are not converted. They must be converted by some other mechanism before being read or they can be read as binary files with the SINGLE_BLOB option.

OPENROWSET isn't flexible about how you provide the name of the file. It must be a string constant. That requirement forces the use of dynamic SQL when the file name isn't known in advance.

Here's a stored procedure that reads any text file and returns the contents as an output variable:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROC [dbo].[ns_txt_file_read] 
    
@os_file_name NVARCHAR(256)
   ,
@text_file VARCHAR(MAXOUTPUT 
/* Reads a text file into @text_file
*
* Transactions: may be in a transaction but is not affected
* by the transaction.
*
* Error Handling: Errors are not trapped and are thrown to
* the caller.
*
* Example:
    declare @t varchar(max)
    exec ns_txt_file_read 'c:\temp\SampleTextDoc.txt', @t output
    select @t as [SampleTextDoc.txt]
*
* History:
* WHEN       WHO        WHAT
* ---------- ---------- ---------------------------------------
* 2007-02-06 anovick    Initial coding
**************************************************************/ 
AS 
DECLARE 
@sql NVARCHAR(MAX)
      , 
@parmsdeclare NVARCHAR(4000

SET NOCOUNT ON 

SET 
@sql 'select @text_file=(select * from openrowset (
           bulk ''' 
@os_file_name '''
           ,SINGLE_CLOB) x
           )'

SET @parmsdeclare '@text_file varchar(max) OUTPUT' 

EXEC sp_executesql @stmt @sql
                 
@params @parmsdeclare
                 
@text_file @text_file OUTPUT 

To see how it works, just execute the example script:   First create a text file called "SampleTextDoc.txt" and add some text data to the file.  For our example we added the following text "The quick brown fox jumped over the lazy dog.".

DECLARE @t VARCHAR(MAX)
EXEC ns_txt_file_read 'c:\temp\SampleTextDoc.txt'@t output
SELECT @t AS [SampleTextDoc.txt] 

The results are:

SampleTextDoc.txt
The quick brown fox jumped over the lazy dog.

(1 row(s) affected)

The performance of reading text files is remarkably fast because the files are read sequentially. Using a 64 bit SQL Server 2008 on a development machine, reading a file of 750,000,000 bytes took only 7 seconds.

Next Steps

  • If there is a need to bulk insert large text files or binary objects into SQL Server 2005 or 2008 look at using OPENROWSET


Last Update: 12/10/2008


About the author
MSSQLTips author Andy Novick
Andy Novick is a SQL Server Developer in the Boston area with 25 years of database and application development experience.

View all my tips


print tip Print  
Become a paid author




Recommended For You








Comments and Feedback:
Sunday, December 28, 2008 - 11:23:15 PM - snakyjake Read The Tip

What are the differences between using BCP and OPENROWSET?  Why use OPENROWSET over BCP?

For transferring data between two SQL Servers, why not use OPENDATASOURCE? 

Or why not use Replication?

Thank you,

Jake


Monday, December 29, 2008 - 8:39:05 AM - aprato Read The Tip

BCP is optimized for pumping a continuous stream of data

As for differences 

OPENROWSET is used to execute a query against an OLEDB provider. You're providing connection details as well as the query you wish to run.

OPENDATASOURCE returns a reference to a data source which can be used in
4 part naming for those situations where you don't have a linked server set up. It tells how to connect to a server.

 


Monday, December 29, 2008 - 9:47:30 AM - snakyjake Read The Tip

Aprato,

Thank you for the reply.  In the article it says I can use OPENROWSET to read a file.  I assume the file is streamed, and therefore the same performance as BCP.

Just as confusing, there are multiple ways to do ETL loads, and I'm not sure what the performance advantages/disadvantages are of each option (assuming that source and target are both SQL Server). I imagine someone has already done the performance tests, or knows how they work, and can help provide the information.

Thank you,

Jake


Monday, December 29, 2008 - 10:13:41 AM - aprato Read The Tip

http://msdn.microsoft.com/en-us/library/ms187042(SQL.90).aspx

Consider SQL Server Replication.  It needs to transfer the data from the log on the source db to the destination db.  How does it accomplish this?  Under the covers it uses bcp.   If that's the case, then my suspicion is that bcp may be the most performant way.  Your own tests on large sets of data can verify this.


Wednesday, December 31, 2008 - 9:08:27 AM - sjimmo@charter.net Read The Tip

In my tests, using a text file with several fields/columns rows, I found:

Using OpenRowset was slower on files with many (500000+ records) than BCP

Using OpenRowSet read the records enmass to the tempdb and then applied them to the database table. BCP/DTS and SSIS allows me to control the batch size which then allowed those records with no errorr to be applied. Using the error file, I can find where the records were with problems fix them and then reapply them. I could not do this with OpenRowSet.

I maty not be using all of the functionality of OpenRowSet. But that is what I have found in the past few years. Thus, on very large files, I use either DTS, SSIS or BCP. On smaller files, I use openrowset and can control it within a transaction batch.


Wednesday, December 31, 2008 - 12:08:40 PM - snakyjake Read The Tip

Appreciate the reply.

I like the explanation on why OpenRowset is thought to be slower -- loading into tempdb.

Does BCP load into tempdb too?  If so, then the primary advantage of BCP is the easy use of batching.


 


Wednesday, December 31, 2008 - 12:43:54 PM - aprato Read The Tip

 sjimmo...did your test usethe OPENROWSET(BULK…) syntax?  Just curious.


Wednesday, December 31, 2008 - 1:32:52 PM - sjimmo@charter.net Read The Tip

I tried it with and without BULK. This was actually done in an attempt to eleminate DTS packages and go to straight SQL Code. Unfortunately some of my files contain 50M+ records and are loaded nightly. DTS and BCP always won hands down.


Wednesday, December 31, 2008 - 1:37:47 PM - aprato Read The Tip

 I suspected it would but I've never done a formal test.  You should write a tip on your findings.  It would be really enlightening.


Wednesday, December 31, 2008 - 1:50:33 PM - sjimmo@charter.net Read The Tip

Maybe as I learn more about the forum, customs/habits etc. I joined today, only because I was reading the subject article, and felt that I had to add my experience(s). I have been doing this with Sybase, MS SQL Server quite a few years (longer than one would like to admit).

Thanks


Tuesday, May 29, 2012 - 8:41:23 PM - ricardo Read The Tip

SELECT  *
FROM OPENROWSET
(    BULK 'pathfile',
    SINGLE_CLOB    
) AS A


Monday, June 25, 2012 - 8:41:43 AM - Heba Read The Tip

Thanks this post is so helpful to me

u said 

If there is a need to bulk insert large text files or binary objects into SQL Server 2005 or 2008 look at using OPENROWSET

i searched a lot on how to read large files but i didn't found any helpful issue.

please could u help me?



Friday, July 20, 2012 - 8:55:16 AM - Ramakrishna Kappagantula Read The Tip

Is there any limitation on the number of columns that can handle using OPENROWSET? If so, could you please let us know the maximum number of columns that can be handled. For example, I have a .CSV file which contains data (i.e., questionaire) with 3500 columns in it. Just wondering whether it can support that many columns. If so, I just want to do an unpivot all the questions. Please suggest?


Thursday, October 11, 2012 - 8:17:20 AM - SPRAO Read The Tip

Hi

I have an Classoc ASP application which uploads one file at a time to a column on the SQL Server table. 

I am using the following command and it is working fine.

Server.CreateObject("ADODB.Stream")

INSERT INTO MY_TABLE (FILE_NAME, REPORT_FILE) VALUES(?, ?)

ADODB.Command object is to build the query.

ADODB.Stream object is used to read the file from the hard disk and assign to the parameter

StatementADO.Parameters(1) = ReportFile.Read

The code is working fine. I have to load only one file when user interacts with the application.

Do I still need to use OPENROWSET ?

 

Thursday, March 28, 2013 - 4:38:07 PM - Suman Read The Tip

 

Thanks this post is so helpful to me


Friday, June 07, 2013 - 11:23:24 AM - VincentVega Read The Tip

Hi! And Thanks alot for this article!

But now i have another Problem. I have Files with Tables like csv, xls and till now i use this to select the content (txt example):

SELECT * FROM OPENROWSET('MSDASQL','Driver={Microsoft Text Driver (*.txt; *.csv)};DEFAULTDIR=E:\;Extensions=CSV;','SELECT * FROM c:\_test.txt')

Is it possible to do something like that directly with a blob field (filestream), or do i have to get the guid and type of the filestream i want to select and built a statement like that (txt example):

SELECT * FROM OPENROWSET('MSDASQL','Driver={Microsoft Text Driver (*.txt; *.csv)};DEFAULTDIR=E:\;Extensions=CSV;','SELECT * FROM c:\myfilestreams\0000001d-00000143-0002')

Unfortunely that doesn't work, too ;(  -- I think because security issues?!

Any help welcome.


Wednesday, December 11, 2013 - 5:14:56 AM - sonali Read The Tip

Hi

    Thanks for the article

I need to read content of large files up to 50 mb and these are ai files or can be of nay other type , when using openrowset , it reads the file but truncates it in middle

Is there any way to read the file fully and then insert it into database , I have both varchar(max) and varbinary(max) columns in my table

Thanks a lot in advance

sonali

 



Post a Comment or Question

Keep it clean and stay on the subject or we may delete your comment.
Your email address is not published. Required fields are marked with an asterisk (*)

*Name   *Email Notify for updates

Signup for our newsletter


Comments
*Enter Code refresh code


 
Sponsor Information







Copyright (c) 2006-2014 Edgewood Solutions, LLC All rights reserved
privacy | disclaimer | copyright | advertise | about
authors | contribute | feedback | giveaways | user groups | community | events | first timer?
Some names and products listed are the registered trademarks of their respective owners.