Learn more about SQL Server tools

mssqltips logo
giveaway
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Strip double quotes from an import file in Integration Services SSIS


By:   |   Read Comments (32)   |   Related Tips: More > Integration Services Development

Problem

When loading data using SQL Server Integration Services (SSIS) to import data from a CSV file, every single one of the columns in the CSV file has double quotes around the data. When using the Data Flow Task to import the data I have double quotes around all of the imported data. How can I import the data and remove the double quotes?

Solution

This is a pretty simple solution, but the fix may not be as apparent as you would think. Let's take a look at our example.

Here is the sample CSV file as it looks in a text editor. You can see that all of the columns have double quotes around the data even where there is no data. The file is comma delimited, so this should give us enough information to import the data column by column.

sample csv data

To create the package we use a Data Flow Task and then use the Flat File Source as our data flow source.

ssis control flow ssis data flow

When setting up the Flat File Connection for the data source we enter the information below, basically just selecting our source file.

ssis connection manager

If we do a quick preview on the dataset we can see that every column has the double quotes even the columns where there is no data. If you open the text file in Excel the double quotes are automatically stripped, so what needs to be done in SSIS to accomplish this.

ssis flat file connection

On this screen you can see the highlighted area and the entry that is made for the "Text qualifier". Here we enter in the double quote mark " and this will allow SSIS to strip the double quotes from all columns.

ssis text qualifier

If we do another preview we can see that the double quotes are now gone and we can move on to the next part of our SSIS package development.

ssis data preview from connection manager

As mentioned above, this is a simple fix to solve this problem. If you are faced with this issue, hopefully this gives you a quick answer to get your development moving forward. This same technique can be used to strip any other text qualifier data from your files.

Next Steps


Last Update:






About the author
MSSQLTips author Greg Robidoux Greg Robidoux is the President of Edgewood Solutions and a co-founder of MSSQLTips.com.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Wednesday, June 14, 2017 - 4:42:50 PM - Greg Robidoux Back To Top

Hi SheFixesThings,

I just used the SQL Server Import and Export Wizard to import the 3 sample lines you have.  I am guessing the end table should have 6 columns.

These are the options I used:

Format: Delimited (comma)

Text qualifier: "

Header row delimiter: {CR}{LF}

This worked fine for the 3 example lines you show below.

-Greg


Wednesday, June 14, 2017 - 4:33:08 PM - Greg Robidoux Back To Top

Hi SheFixesThings,

not sure what the end result is for your examples.  Can you show how the data should be separated.

It looks like there are six columns, but wanted to be sure.

Thanks


Wednesday, June 14, 2017 - 3:53:11 PM - SheFixesThings Back To Top

I recently just started a new project I just came upon this same issue wiith my source data, sort of. My files are from external sources who we cannot change how they provide what they send so I had to find a solution but would like to improve upon that solution so ideas welcome. Using this idea for on how to change the delimiters, what can a person do when there are varying delimiters? 

for example:

1. xxx,yyy,zzz,"abc,def,ghi",aaa,bbb   -- multiple commas within a set of "

2. "abc,xxx","abcabc",bbb,ccc,ddd,zzz   --different colum with a comma within the set of "

3. aaa,bbb,c,dd5655d,eeeeeeeae,ffff     -- no " at all and column widths can vary for all data

From the inital import of the file saved as CSV, I started the import into a temp table with ONE row defined as a varchar(max) as well as an Identity row so that the initial data will hopefully be numbered as it came in. 

Then I split that table into 2 new temp tables:  a DQ (double-quote) and non-DQ using  WHERE col LIKE '%"%'   ---it it contained a " anywhere in the row. 

 

Then in a SQL Script I wrote some messy SQL script (it's below if you REALLY want to see it) to find and replace those "columns" with the embedded commas to spaces, or even pipe if it's needed to convert back to comma. I think I should change to a loop in case I end up with more than 2 sets of " but I am pressed for time because I have 20 more of these lovely files to figure out how to handle.

Then I copied the modified table and the non-modified into another temp table(yes, lots of temp tables, but I needed to be able to trace to make sure it worked). 

Ideally, I was hoping for a way to now take that final modified table and ETL it into another formatted/columnar defined table with 55 cols for this particular file and have SSIS(I'm 2012) do the dirty work like it does when a source is csv/txt/excel, but I couldn't find anything that could handle that. 

   ??? Does anyone know how to take a SQL table (NOT external file) as the source, which contains comma-delimited data, and send to a destination without using script logic? ????

Due to my time-crunch, what I ended up doing is then taking my newly created and cleaned up of commas between ", aka the 'pre-final CSV SQL TABLE, and exporting to a CSV destination file and then setting up a new Connection Mgr to read that new CSV as a Source and ETLing into the final 55 columns table. 

Any ideas? 

 

Brace yourself, it's rather crazy SQL down here...

I'd LOVE to avoid the manual coding because I'm going to be in a pickle if subsequent monthly files end up with more than 2 sets of " :( 

 

 ---converts those rows that contain embedded commas within 2 sets of 2 (for now) " to blanks. 
-- then the output from here will be joined up with those w/o " and then sent back to csv to reimport as corrected csv
-- yes, a roundabout way but best I could think of right now

 

truncate table [homeServer].[dbo].[temphold_SourceData1_varcharMaxRAW]

--created this to keep the original identity ID created from the initial import

 INSERT INTO [homeServer].[dbo].[temphold_SourceData1_varcharMaxRAW]

--replace the 2nd set with values from p5 as the source and there should be no more commas

select cast(p6.ID  as int) as OrigID
         ,cast(REPLACE(p6.newColumnRAW1,p6.set2,p6.NEWset2) as varchar(max)) as newColumnRAW
FROM (--replace the first set with values from the p5 set
      select p5.ID
              ,REPLACE(p5.ColumnRAW,p5.set1,p5.NEWset1) as newColumnRAW1
              ,p5.set2
              ,p5.NEWset2
      from (--chg comma to space if between the " in the orig data (p4) and join to data that contains 4 positionals containing " (p3)
               SELECT p4.ID
                          ,p4.ColumnRAW   
                          ,p3.pos1
                          ,p3.pos2
                          ,p3.pos3
                          ,p3.pos4  
                          ,substring(p4.columnraw, p3.pos1,((p3.pos2-p3.pos1) + 1)) as set1
                          ,substring(p4.columnraw, p3.pos3,((p3.pos4-p3.pos3) + 1)) as set2
                          ,REPLACE(substring(p4.columnraw, p3.pos1,((p3.pos2-p3.pos1) + 1)), ',',' ') as NEWset1  --changed here to space
                          ,REPLACE(substring(p4.columnraw, p3.pos3,((p3.pos4-p3.pos3) + 1)), ',',' ') as NEWset2  --changed here to space
              FROM temphold_SourceData1_DQData p4            

   -- join from orig by ID created by other DTSx RAW import to those with dq positions

               INNER JOIN (select pos1
                                          ,pos2
                                          ,case 
                                             when xpos3 < pos2
                                               then 0
                                                else xpos3
                                            end as pos3
                                ,case
                                   when xpos3 < pos2  
                                     then 0
                                       else charindex('"',ColumnRAW,(xpos3+1)) 
                                 end as pos4

                                ,columnRAW
                                ,ID
                          FROM (--pass2 of data - get position3 having "
                                     select  pos1
                                               ,pos2
                                               ,charindex('"',ColumnRAW,(pos2+1)) as xpos3
                                               ,columnRAW
                                               ,ID
                                     FROM (--pass1 of data - get positions 1 and 2 of those containg " (dbl quotes = dq)
                                               select ID,
                                                        cast(charindex('"',ColumnRAW) as int) as  pos1
                                                       ,cast(charindex('"',ColumnRAW,(charindex('"',ColumnRAW)+1)) as int) as pos2
                                                       ,columnraw
                                              from temphold_SourceData1_DQData
                    --testing here    where id IN (5,1537)
                                              ) p1
                                 ) p2
                         ) p3 

                 ON p4.ID = p3.ID
           ) p5  
     ) p6

 


Wednesday, May 24, 2017 - 10:08:03 AM - gregor Back To Top

srsly this helped me alot. Thanks man.


Friday, April 21, 2017 - 6:12:01 PM - Lydia Back To Top

You just saved me hours of work! Great explanation, thank you sir!!!


Friday, May 13, 2016 - 12:28:04 PM - Joe Lehane Back To Top

 How do you cope with commas within the fields which were double quoted?

 


Thursday, February 04, 2016 - 12:00:17 PM - Brad Back To Top

 Thanks for this!  It was very simple and worked very well.

 


Tuesday, August 25, 2015 - 1:54:28 PM - Mateen Siddiqui Back To Top

thanks that worked!


Friday, March 20, 2015 - 4:19:05 PM - Kris Back To Top

Thanks for educating the community with this Tip. I do appreciate your volunteer-ship.

My Questions are:

1. Text Qualifier accepts only qualifier or multiple?

 

2.  I have text file as below

 

10,"Kris",'Acct', "$45.00", $666,New York

 

How to handle such file using Text Qualifier?

 

Thanks

 

 


Friday, March 20, 2015 - 11:55:25 AM - Sancho De Jesus Back To Top

This post is very helpful. Thank you for posting.


Friday, March 20, 2015 - 8:26:53 AM - Greg Robidoux Back To Top

Hi Ajit,

there is not an easy way to do this since this process thinks that a double quote is a text qualifier.

You could try to add another process to strip out all double double quotes.  This could be done in the package or you could do outside of SSIS. 

There may be another way to do this, but I am not sure offhand.

-Greg


Friday, March 20, 2015 - 7:08:16 AM - ajit Back To Top
please check below i am not able to separate "" in bold
thanks please check and let me know 
 
 
"03",4186,2015-03-16 00:00:00,4,"VALFLOWCVA",3.000,42.00,"66        ","1290","FOR G.S.T ENDFACING VICE LINER","",,,,,,"FLOW CONTROL VALVE, MODEL NO.-IL 340 1,0.2 -10 BAR, SIZE=3/4"", MAKE=SCHRADER BELLOWS, FOR TUBE MILL - 02 CUTTING CO2",,"","NOS   ",,,"","N","",,"N",.0000,.00,"N","N",""

Thursday, September 04, 2014 - 8:43:46 PM - Alex Back To Top

I have the same issue; how do we get around this?


Wednesday, May 28, 2014 - 9:38:03 AM - Inam Back To Top

Thank you. your post solved my problem.


Wednesday, October 16, 2013 - 5:51:24 AM - Bhoj Back To Top

Good One !!


Tuesday, April 16, 2013 - 5:04:03 AM - Mikel Back To Top

Hi,

Is there a way to remove the double quote for those column with no data only.  We want to keep the double quote for those column that has data.


Thursday, April 04, 2013 - 11:34:02 AM - Driq Back To Top

 

Hehe..this is so simple,saves a lot of time. Thanx.


Monday, February 25, 2013 - 3:13:30 PM - Carol Back To Top

OK - I found the problem, and it's environment specific.  My development environment substitutes _x0022_ in the "TextQuallifier" property, and my Production Environment doesn't understand that...  Right click your .dtsx, view the source and look for TextQualifier.  If you port it over to your production environment, fix the TextQualifier and recompile, it finally works....

Thank you Jamie Thomson!

http://sqlblog.com/blogs/jamie_thomson/archive/2011/10/11/31040.aspx

 


Thursday, February 07, 2013 - 1:06:52 AM - Carol Back To Top

My SSIS package runs great interactively from Visual Studio.  However, after I deploy to SQL Studio 2008/R2 and schedule it the quotes are imported into the table.

Can anyone please help?

Thanks


Thursday, January 31, 2013 - 6:15:09 AM - Ashish Back To Top

 

How about quote and coma inside quote?

how parse folowing txt or csv file formate?

"12333","Ashish","Ashish , Patel 6"","24566"

1. 12333

2. Ashish

3. Ashish , Patel 6"

4. 24566


thanks


Thursday, January 03, 2013 - 4:50:33 PM - Suman Back To Top

Thanks a lot, this worked like a charm.

 

 

 


Friday, November 30, 2012 - 10:03:15 PM - Ben.Gu Back To Top

Thank you very much.

it worked.


Friday, November 16, 2012 - 4:58:06 PM - Joanie Back To Top

Thank you.  Worked very well!


Friday, August 03, 2012 - 11:06:49 AM - Prolay Back To Top

Thanks a lot , a little " makes a lot difference

 


Friday, June 15, 2012 - 9:14:01 AM - FunkyKoval Back To Top

Hi

 

I have similar problem, also I work with file with double quotes at begin and at the end of some columns in csv file

but unfortunately some lines are wrong, they start with double quot but do NOT end with it!

of course data flow task fail, how do you sugest to repair the before running data flow tusk to be the most effective

unfortunately files are quite big >200MB

 

Funky


Thursday, February 23, 2012 - 11:56:15 AM - Suresh Back To Top

 

Thanks Greg, It fixed my issue


Monday, February 20, 2012 - 9:43:19 AM - AmyNxDx Back To Top

Thanks! This simple fixed helped me TONS!!


Tuesday, May 03, 2011 - 8:27:23 PM - Adam Back To Top

Do'H!!!

Such a simple fix. I started manually pulling out the data and ran into a bunch of data conversion errors down the road.

Saved me a bunch of time. Thanks!


Wednesday, March 23, 2011 - 12:07:15 PM - Pete Back To Top

I have the same problem; how do we get around this!?


Wednesday, June 30, 2010 - 12:20:07 PM - jerryol Back To Top

Ha, ha.  When I google "undouble kirk haselden" I do get 4 hits.  Three of them point to this article, and one is junk.  Thanks anyway.


Monday, August 31, 2009 - 6:13:21 PM - MikeInNH Back To Top

 Hi William

Yeah, the tip published here doesn't work so good if the field separator (comma) is included in the quoted text.  However, I believe if you Google "undouble kirk haselden" you will get about 4 hits.  You probably know that Haselden is the "father of SSIS" - His solution is to use a script component to parse lines.  After some Googling you can probably get the code (in Visual Basic .NET I believe) for the component. 

 Hope that helps. 

Mike 


Monday, April 07, 2008 - 9:57:46 AM - william.weber Back To Top

I realize this is a very old tip, but I've been searching endlessly for a solution to my problem.

How do you deal with embedded text qualifiers in SSIS? I know that DTS dealt with this stuff correctly and I can't for the life of me figure out how to make SSIS read the file correctly. For example:

"This is field1","This is ""field2"""

 In DTS if you selected " as your text qualifier it would correctly give you the following:

This is field1, This is "field2"

In SSIS it just breaks. Any help is greatly appreciated.


Learn more about SQL Server tools