Importing Data From Excel Using SSIS - Part 1

By:   |   Comments (23)   |   Related: More > Integration Services Excel


Problem

Recently while working on a project to import data from an Excel worksheet using SSIS, I realized that sometimes the SSIS Package failed even though when there were no changes in the structure/schema of the Excel worksheet. I investigated it and I noticed that the SSIS Package succeeded for some set of files, but for others it failed. I found that the structure/schema of the worksheet from both these sets of Excel files were the same, the data was the only difference. How come just changing the data can make an SSIS Package fail? What actually causes this failure? What can we do to fix it?  Check out this tip to learn more.

Solution

You must be wondering why the changes in the data can cause the SSIS Package to fail. Before I can talk about this issue in detail, first let me demonstrate this issue with an example.  This example should demonstrate the actual failure and solution for this problem. As you can see in the image below, I have 18 records in the Excel worksheet, when I ran my SSIS Package to load the data from this worksheet, it worked fine.

Sample Excel worksheet with ProductDescriptionID and Description columns

In the next image, I made some changes to row number 7. The description for ProductDescriptionId 907 is much larger than the previous data load. When I ran my SSIS package again to load the data from this worksheet, it worked fine as well.

Sample Excel worksheet that has a long Description in the seventh record

In the next image, I reverted the previous change and made some changes to the row number 14. The description for ProductDescriptionId 1203 is much larger than the previous data load. When I ran my SSIS package again to load the data from this worksheet, it failed with the following exception:

[Excel Source [1]] Error: There was an error with output column "Description" (18) on output "Excel Source Output" (9). The column status returned was: "Text was truncated or one or more characters had no match in the target code page.".
[Excel Source [1]] Error: The "output column "Description" (18)" failed because truncation occurred, and the truncation row disposition on "output column "Description" (18)" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "Excel Source" (1) returned error code 0xC020902A. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.

Sample Excel worksheet that has a long Description in the 14th record

What caused the above SSIS package to fail?

SSIS Excel Connection Manager determines the data type of each column of the worksheet on the basis of the data of that particular column from the first 8 rows. This is the default behavior and the SSIS connection manager uses a value in the registry to determine the number of rows for the data type determination.  Before I can explain more about the registry key and it's setting, let's see if we can do anything in the SSIS package to prevent this issue.

In your SSIS Package, right click on the Excel source in the data flow task and click on "Advance Editor for Excel Source".  Next change the data type and length of the column from its default value. For example, in my case I have a "Description" column which has text data and the length is up to 500 characters. I have put max length for this column as 1000. Now click on "OK" button.

Advanced Editor for Excel Source where the description column is changed to a length of 1000

But what is this, the validation of the Excel Source failed itself as you can see below:

Red X on the Excel Source task that indicates the Description column has an issue

When you double click on the Excel Source task, it will inform that the component is not in a valid state and asks for your confirmation to fix this issue. When you click on the "Yes" button it will reset the data type and length of the column to what it was earlier, before we made the changes as above.

SSIS Editing Component Error indicating the properties do not match for the Description column

Now to summarize the whole thing : SSIS Excel Connection manager determines the data type and length of columns from the worksheet on the basis of the first eight rows of data. Even though we can change the data type and length from the Advance Editor for Excel Source, it will not be valid and the information will be reset by SSIS Excel Connection manager automatically using the same determination process.

Fixing the problem now in the registry

So as I said before, the number of rows to consider when determining the data type and length is determined by a registry key called TypeGuessRows by the SSIS Excel Connection Manager. By default its value is 8 and hence 8 rows are considered when determining the data type and length.

Now coming back to the solution, these are some of the options to address this problem:

  • Configure the source system to provide your Excel file in which data is sorted on the basis of the length of the data in each column so that largest value of each column appears in the first row and alphanumeric data appears before numeric data.
  • Configure the source system to provide a dummy record in your Excel file as first row with desired data type and size; then after data import you can remove/delete that dummy record from the database.
  • Configure the source system to provide you a csv file instead of Excel file because with a csv file you have more control to determine data type and length of a column.
  • Changing the TypeGuessRows registry key to 0 from its default value of 8. This will make the Excel connection manager consider all the rows when determining data type and length of each column.

Unfortunately, the first three options do not apply in my scenario as I do not have control on the source system providing data in worksheet.  With this being said, I made the change in the TypeGuessRows registry key and updated its value from 8 to 0. After making this change, my same package worked like a charm for the same Excel worksheets for which it failed last time.

Registry Key Location - [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel]

Please Note : On a 64-Bit Windows Server machine the registry key will be available here:
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel

Registry value for the TypeGuessRow key

Please be cautious when changing the value for TypeGuessRows registry key and keep these points in mind:

  • TypeGuessRows registry key is global setting and its not only going to impact your SSIS Package, but it will impact every place where it is referenced.
  • Changing the value for TypeGuessRows registry key to 0 from its default value of 8, makes the Excel connection manager consider all the rows when determining the data type and length of each column and hence it could have a severe impact on the performance of your SSIS package if the number of rows in your Excel worksheet is large.

Now, as we saw, to fix this issue the easiest solution is to make change in the registry, but in many scenarios you will not have control in making this change as the servers are managed by the Operations Team or there might be several other applications running on the same machine. Even if you have control in changing this setting, this change might cause your SSIS Package to perform poorly, based on the amount of data you have in your Excel worksheet, and it may impact other systems as well when this registry key is being referenced. So now the question is, is there any way we can avoid making changes in the registry, but still solve the problem? Well, stay tuned for part 2 of this tip for a solution which does not require a registry change, but still solves the problem.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Arshad Ali Arshad Ali is a SQL and BI Developer focusing on Data Warehousing projects for Microsoft.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Friday, July 18, 2014 - 7:13:04 AM - laxman Back To Top (32769)

Hi,

How to handle multiple exel sheets data move to destination,Please help.


Friday, June 20, 2014 - 7:49:26 AM - GuruNoT Back To Top (32323)

I usually don't leave comments when I get online help from a technical blog which is very bad on my part, however in this case I want to say thanks ti the author for providing such quality information and tips. I have always find his tips very useful and helpful as a Junior developer. Its frustrating sometimes that the proprietor Microsoft does not provide more relative assistance for the bugs on there products with solutions of the  high same quality as Arshad.

Thanks very much


Wednesday, November 6, 2013 - 3:51:58 AM - Irit Back To Top (27412)

Hello,

My problem in the ETL loading from excel

I have more than 200 fields(columns) to be loaded, but the source is choosing the only a specific number of columns (about 170)

How can i resolve this problem?

 

Thanks


Tuesday, July 16, 2013 - 1:53:14 PM - Nik Back To Top (25862)

I loved the article, but generally convincing the customer to change something in the registry is very difficult, my experience was “Admin war”, especially if you are working with a bank

 

But correct me if I am wrong, cant the “TypeGuessRows” be set to 0 in the connection string? And if it can will it solve the problem? Something like…

 

conn.ConnectionString = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Users\Nick\Desktop\Pricing2.xlsx" + @";Extended Properties=""Excel 12.0 Xml;HDR=Yes;IMEX=1;ImportMixedTypes=Text;TypeGuessRows=0"""; 

 

Generally I change my connection string in a SCRIPT TASK (mainly setting the file name and etc...) and loop through each Excel file and I add the 2 main things => TypeGuessRows=0  and  IMEX=1

Thanks again for the article

Sincerely

Nik


Tuesday, July 9, 2013 - 7:02:59 PM - Chris Chitemerere Back To Top (25767)

The suggestion to sort the data table in descending order with the column with the highest length as the first row in the worksheet works perfect, i had 1948 records import with no trouble


Thursday, June 20, 2013 - 11:08:56 AM - Lorne Corrigan Back To Top (25504)

I have never seen this information before either.  This saves me hundreds of hours Mickey Mousing the packages.  The Regedit information is a miracle.

Thank you.


Sunday, June 2, 2013 - 11:16:57 AM - Faran Back To Top (25240)

Hi Arshad, I have tried it by changing the value in the registry, but it doesn't work. Any advice? Whats are the lengths of the columns in the prodoctInfo table?

 

Thanks

Faran


Thursday, April 18, 2013 - 1:38:13 PM - KV Back To Top (23426)

I have been trying to figure out a work around for this problem for about a week.  I have gone so far as to try to change the registry (I had the network administrator do it.)  Even doing this, it still did not work.  I am dealing with a very large file . . . about 67K records.  The field is coming in as a DT_WSTR field of 255 because of the fact there is no data in the field until somewhere in the 200 record mark.  I cannot get it to set to DT_NTEXT.  Why would changing the registry not fix this problem . . . do you have any idea?


Sunday, April 7, 2013 - 12:39:51 PM - Steve Purdy Back To Top (23222)

I have a situation where the worksheet contains at least three different data layouts (tables) and the data I am interested in is in the third layout (tabl) which starts about 800 rows down the spreadsheet. It is impossible to ask the provider - a multi national organisation - to alter the format of the work sheet, so an suggestions would be most appreciated


Sunday, March 24, 2013 - 12:48:34 PM - Inspired Back To Top (22989)

A non-registry alternative (hack) is to have the cell with a large value in the first 8 rows.

In my case I used 1,000 spaces appended to a cell value then used Trim() when I read the worksheet cells.

I also hoghlighted the cell in yellow as a reminder.


Friday, February 1, 2013 - 3:29:36 PM - David Bridge Back To Top (21867)

this is not a bug from ms in sql or excel. excel may look like a table but it isnt. its a matrix of cells, each of which can have its own type.

 

ms had to do something to determin type where we have used excel in this way and this solution works for most cases. the biggest issue you face when importing from excel is ehere tge type changes in those top 8 rows. thats a real pain but dont ever edit the reg to overcome this.

i wrote a blog on this issue for sql 2008. its not a new thing.

dave


Monday, December 10, 2012 - 12:17:00 AM - Ranjith M Back To Top (20844)

We can change the field width using advanced editor without changing anything in Registry.

  1. Go to the Data Flow Task property
  2. Set ValidateExternalMetaData=false and proceed to change column width as you want

Thanks & Regards

Ranjith

 


Wednesday, November 21, 2012 - 12:57:13 AM - Scot Back To Top (20442)

You would have thought there would be value on the connection that allows you to either 'scan file, determine and import' or just 'import based on my specification' This particular problem causes so many headaches because everyone uses excel (rather than csv) and there's always titles and other 'human' stuff in the top rows that screw up the datatypes. I know what the datatypes are in the excel files but I can't force SSIS to use them. Stupid thing. I wonder what the real reasoning is behind this.


Monday, October 15, 2012 - 6:03:44 AM - Mahendra Back To Top (19923)

Nice example..Thanks a lot


Tuesday, October 9, 2012 - 4:07:26 PM - Izhar Azati Back To Top (19840)

See the note in http://connectionstrings.com/excel#microsoft-jet-ole-db-4-0

 


Tuesday, October 9, 2012 - 12:52:31 PM - Will Back To Top (19835)

You're joking right? This is easily the worst solution to this problem I have ever read. You want me to change a registry setting on a production server that you admittedly say will make the package perform poorly, and could negatively impact other systems. Then you tease a different method that you can do without making the registry edit.

The other method should have been your whole article and registry edit a footnote with a big fat warning saying, "Don't ever do this because it's the worst idea ever."


Monday, October 8, 2012 - 5:38:10 AM - Terry Back To Top (19817)

As hinted by other comments, this does seem to be a workaround for yet another Microsoft BUG!


Friday, September 21, 2012 - 8:25:59 AM - Edward Back To Top (19607)

Or you can edit the mapping of the created columns to dictate exactly the size of the columns to create which I have always done.


Thursday, September 20, 2012 - 10:01:00 PM - michael Back To Top (19604)

We are using opendatasource within sql to insert excel data into staging tables and we are constantly running into problems especially with security/impersonation and text fields being larger than 255 not being inserted.  Microsoft needs to get their act together and fix the ACE.12 driver and most likely previous excel drivers because excel plays a major part in office users lives and this should be more stable for production use. Lets hope they get the message?


Thursday, September 20, 2012 - 2:10:38 PM - TonyC Back To Top (19596)

This is terrible! Nothing against you Arshad, you are just providing a "work around" (not a solution) to a SQL Server problem - so this is appreciated. All of the work arounds suggested so far have too large of an associated risk. Changing the REGISTRY affects everything - no good. Sorting the data to get the large column size is also bad - one could have lots of rows, albeit you would only affect packages where you made this change explicitly.

A real solution (from MS) would be to allow control over the LENGTH of the data as the property window "seems" to provide. I wonder why MS chooses allow one to update this but then disallows the implementation (outside of changing the REG KEY).

 


Thursday, September 20, 2012 - 11:57:27 AM - Rp Back To Top (19593)

Nice article. Would like to know more. Waiting for Importing Data From Excel Using SSIS - Part 2


Thursday, September 20, 2012 - 11:39:21 AM - Bill Back To Top (19592)

Good article!  I've never seen this particular bit of information presented anywhere else.


Thursday, September 20, 2012 - 7:56:24 AM - G Bryant McClellan Back To Top (19588)

Another alternative to changing the registry value is to 'configure' your source data through sorts or exclusions to get a worst case scenario. This gives you a starting point without the possibly negative impact of a global change.















get free sql tips
agree to terms