Using OPENROWSET to read large files into SQL Server
OPENROWSET is a T-SQL function that allows for reading data from many sources including using the SQL Server's BULK import capability. One of the useful features of the BULK provider is its ability to read individual files from the file system into SQL Server, such as loading a data from a text file or a Word document into a SQL Server table. This capability is the subject of this tip.
The BULK option was added to T-SQL in SQL Server 2005.
When used with the BULK provider keyword you can name a data file to read as one of three types of objects:
- SINGLE_BLOB, which reads a file as varbinary(max)
- SINGLE_CLOB, which reads a file as varchar(max)
- SINGLE_NCLOB, which reads a file as nvarchar(max)
OPENROWSET returns a single column, named BulkColumn, as its result. Here's an example that reads a text file:
SELECT BulkColumn FROM OPENROWSET (BULK 'c:\temp\mytxtfile.txt', SINGLE_CLOB) MyFile
The correlation name, in this case MyFile, is required by OPENROWSET.
There are additional requirements when reading single files that must also be observed as mentioned below.
Access control is always a concern. The operating system level file operations to read the file are executed with the privileges of the account that the SQL Server data engine is using. Therefore, only files accessible to that account may be read. This includes network drives or UNC paths, which are permitted if the account has the privileges. If you want to read network files, run SQL Server as a domain user.
The BULK provider won't convert between Unicode and plain ASCII files. It must be told which type of encoding is used in the file. If you don't the result is error 4806 as seen here:
SELECT BulkColumn FROM OPENROWSET (BULK 'c:\temp\SampleUnicode.txt', SINGLE_CLOB) MyFile
Msg 4806, Level 16, State 1, Line 1 SINGLE_CLOB requires a double-byte character set (DBCS) (char) input file. The file specified is Unicode.
Unicode files must be read with the SINGLE_NCLOB option shown here:
SELECT BulkColumn FROM OPENROWSET (BULK 'c:\temp\SampleUnicode.txt', SINGLE_NCLOB) MyFile
Similarly, files with non text structures, such as Word documents are not converted. They must be converted by some other mechanism before being read or they can be read as binary files with the SINGLE_BLOB option.
OPENROWSET isn't flexible about how you provide the name of the file. It must be a string constant. That requirement forces the use of dynamic SQL when the file name isn't known in advance.
Here's a stored procedure that reads any text file and returns the contents as an output variable:
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE PROC [dbo].[ns_txt_file_read] @os_file_name NVARCHAR(256) ,@text_file VARCHAR(MAX) OUTPUT /* Reads a text file into @text_file * * Transactions: may be in a transaction but is not affected * by the transaction. * * Error Handling: Errors are not trapped and are thrown to * the caller. * * Example: declare @t varchar(max) exec ns_txt_file_read 'c:\temp\SampleTextDoc.txt', @t output select @t as [SampleTextDoc.txt] * * History: * WHEN WHO WHAT * ---------- ---------- --------------------------------------- * 2007-02-06 anovick Initial coding **************************************************************/ AS DECLARE @sql NVARCHAR(MAX) , @parmsdeclare NVARCHAR(4000) SET NOCOUNT ON SET @sql = 'select @text_file=(select * from openrowset ( bulk ''' + @os_file_name + ''' ,SINGLE_CLOB) x )' SET @parmsdeclare = '@text_file varchar(max) OUTPUT' EXEC sp_executesql @stmt = @sql , @params = @parmsdeclare , @text_file = @text_file OUTPUT
To see how it works, just execute the example script: First create a text file called "SampleTextDoc.txt" and add some text data to the file. For our example we added the following text "The quick brown fox jumped over the lazy dog.".
DECLARE @t VARCHAR(MAX) EXEC ns_txt_file_read 'c:\temp\SampleTextDoc.txt', @t output SELECT @t AS [SampleTextDoc.txt]
The results are:
SampleTextDoc.txt The quick brown fox jumped over the lazy dog. (1 row(s) affected)
The performance of reading text files is remarkably fast because the files are read sequentially. Using a 64 bit SQL Server on a development machine, reading a file of 750,000,000 bytes took only 7 seconds.
- If there is a need to bulk insert large text files or binary objects into SQL Server look at using OPENROWSET
About the author
View all my tips