Strip double quotes from an import file in Integration Services SSIS

By:   |   Comments (43)   |   Related: More > Integration Services Development


Problem

When loading data using SQL Server Integration Services (SSIS) to import data from a CSV file, every single one of the columns in the CSV file has double quotes around the data. When using the Data Flow Task to import the data I have double quotes around all of the imported data. How can I import the data and remove the double quotes?

Solution

This is a pretty simple solution, but the fix may not be as apparent as you would think. Let's take a look at our example.

Here is the sample CSV file as it looks in a text editor. You can see that all of the columns have double quotes around the data even where there is no data. The file is comma delimited, so this should give us enough information to import the data column by column.

sample csv data

To create the package we use a Data Flow Task and then use the Flat File Source as our data flow source.

ssis control flow ssis data flow

When setting up the Flat File Connection for the data source we enter the information below, basically just selecting our source file.

ssis connection manager

If we do a quick preview on the dataset we can see that every column has the double quotes even the columns where there is no data. If you open the text file in Excel the double quotes are automatically stripped, so what needs to be done in SSIS to accomplish this.

ssis flat file connection

On this screen you can see the highlighted area and the entry that is made for the "Text qualifier". Here we enter in the double quote mark " and this will allow SSIS to strip the double quotes from all columns.

ssis text qualifier

If we do another preview we can see that the double quotes are now gone and we can move on to the next part of our SSIS package development.

ssis data preview from connection manager

As mentioned above, this is a simple fix to solve this problem. If you are faced with this issue, hopefully this gives you a quick answer to get your development moving forward. This same technique can be used to strip any other text qualifier data from your files.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Greg Robidoux Greg Robidoux is the President and founder of Edgewood Solutions, a technology services company delivering services and solutions for Microsoft SQL Server. He is also one of the co-founders of MSSQLTips.com. Greg has been working with SQL Server since 1999, has authored numerous database-related articles, and delivered several presentations related to SQL Server. Before SQL Server, he worked on many data platforms such as DB2, Oracle, Sybase, and Informix.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Wednesday, July 28, 2021 - 8:58:33 AM - Greg Robidoux Back To Top (89063)
Hi Chintan, for your example you might want to just load all columns as strings and then after you get the data loaded you can strip the " at the beginning and the end and then convert the data into the appropriate data types.

Another thing you could do preprocess the file before using SSIS maybe using PowerShell or some other tool. Replace any occurrence where " is the first character or the last character in the row and then replace "," with just ,

This should strip out all of the double quotes.

-Greg

Wednesday, July 28, 2021 - 3:40:37 AM - Chintan Thakkar Back To Top (89061)
Hello,

This solution was helpful but my csv file have text qualifier as " and some of the columns have "
Below is the example :

"Worker","Y","Mr","Peter ("Pietro")","Marciano","1973-09-19

The column Peter ("Pietro") have " in it.

How to load such a data via SSIS ?

Friday, October 25, 2019 - 3:00:04 PM - Greg Robidoux Back To Top (82900)

Hi Sandeep,

I think you can just type in your own value even if it is not in the dropdown list.

-Greg


Friday, October 25, 2019 - 2:04:08 PM - Sandeep Back To Top (82899)

Hi,

I have to extract data from .txt file to .xlsx but my text file has delimitter ~. Could you please let me know how do I need to extract the data since I didn't find ~ delimitter under the drop down.

Sampleb text format:

ABC~123~Yes~Santa Ana~CA~92701

xyz~456~Yes~Long Beach~CA~92702

AAA~789~Yes~Santa Monica~CA~92703


Tuesday, October 15, 2019 - 10:47:00 AM - Arturo Back To Top (82780)

Awesome, that's the answer that was looking for. This my issue importing data from a CSV with this type of issue:

Code, Amount,...

4428,"13,066.41", ..


Tuesday, November 28, 2017 - 10:25:41 AM - mg Back To Top (73340)

 All of the columns had double quotes around them.  


Tuesday, November 28, 2017 - 10:25:05 AM - mg Back To Top (73338)

What I ended up doing was using powershell to change the delimiter from a comma to a vertical bar. 

C:\users\desktop Import-Csv "file1.csv" -Delimiter `, | Export-Csv "file1_1.csv" -Delimiter "|" -NoTypeInformation

 

Once I did that, I was able to get it to cooperate with SQL.  


Tuesday, November 28, 2017 - 9:13:08 AM - Greg Robidoux Back To Top (73333)

Hi mg,

Do all of the columns have double quotes around them or just certain columns? 

You could bring in the data as character data and then do a replace and strip the double quote from the begining and the ending of each column.  Then you can convert the data to the correct data type.

Not sure of an eaiser way to do this.

-Greg


Monday, November 27, 2017 - 2:34:59 PM - mg Back To Top (73296)

This works great, except when there are commas within the fields, and the import step gets confused because it thinks the in-field commas are delimiters.  Is there any way to skirt this issue?


Monday, November 13, 2017 - 3:44:00 PM - Andrey Shtepa Back To Top (69678)

Hello everyone!

I resolved that problem the next approach.

Before Load data from *.csv flat table where contain bad symbols like "" I launch <Script Task> where in C# I Replace/ cut anyone bad symbols and after that I save into new *.csf file without these bad symbols. And then I load clean *.csv file without any problem.
It works rather quickly even on the big file < 2GB
I hope you folow me.

Moreover we can try to use/ involve regexp function to cleaning data like that case.

 

 

/*
   Microsoft SQL Server Integration Services Script Task
   Write scripts using Microsoft Visual C# 2008.
   The ScriptMain is the entry point class of the script.
*/

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_d8c4cf0361da4569b65c8feedf8270c9.csproj
{
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            if  (Dts.Variables.Contains("NewFile") == true) 
            {

              System.IO.StreamReader sr = new System.IO.StreamReader(Dts.Variables["OrdersConvertAmountFileName"].Value.ToString());

                string strContent = sr.ReadToEnd();
                strContent = strContent.Replace("" + (char)34 + (char)34, "");
                strContent = strContent.Replace("0000-00-00", "0001-01-01");

                sr.Close();
                sr.Dispose();

                System.IO.StreamWriter sw = new System.IO.StreamWriter(Dts.Variables["OrdersConvertAmountFileName"].Value.ToString(), false);
                sw.Write(strContent);
                sw.Flush();
                sw.Close();
                sw.Dispose();
                Dts.TaskResult = (int)ScriptResults.Success;
            }
            else
            {
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
           

        }
    }
}


Thursday, August 17, 2017 - 4:47:56 PM - JP Back To Top (64979)

 Does your last column have quotes still? I follow this method as well and I still get quotes on the last column thus flat file source doesn't execute correctly.

 


Wednesday, June 14, 2017 - 4:42:50 PM - Greg Robidoux Back To Top (57421)

Hi SheFixesThings,

I just used the SQL Server Import and Export Wizard to import the 3 sample lines you have.  I am guessing the end table should have 6 columns.

These are the options I used:

Format: Delimited (comma)

Text qualifier: "

Header row delimiter: {CR}{LF}

This worked fine for the 3 example lines you show below.

-Greg


Wednesday, June 14, 2017 - 4:33:08 PM - Greg Robidoux Back To Top (57420)

Hi SheFixesThings,

not sure what the end result is for your examples.  Can you show how the data should be separated.

It looks like there are six columns, but wanted to be sure.

Thanks


Wednesday, June 14, 2017 - 3:53:11 PM - SheFixesThings Back To Top (57419)

I recently just started a new project I just came upon this same issue wiith my source data, sort of. My files are from external sources who we cannot change how they provide what they send so I had to find a solution but would like to improve upon that solution so ideas welcome. Using this idea for on how to change the delimiters, what can a person do when there are varying delimiters? 

for example:

1. xxx,yyy,zzz,"abc,def,ghi",aaa,bbb   -- multiple commas within a set of "

2. "abc,xxx","abcabc",bbb,ccc,ddd,zzz   --different colum with a comma within the set of "

3. aaa,bbb,c,dd5655d,eeeeeeeae,ffff     -- no " at all and column widths can vary for all data

From the inital import of the file saved as CSV, I started the import into a temp table with ONE row defined as a varchar(max) as well as an Identity row so that the initial data will hopefully be numbered as it came in. 

Then I split that table into 2 new temp tables:  a DQ (double-quote) and non-DQ using  WHERE col LIKE '%"%'   ---it it contained a " anywhere in the row. 

 

Then in a SQL Script I wrote some messy SQL script (it's below if you REALLY want to see it) to find and replace those "columns" with the embedded commas to spaces, or even pipe if it's needed to convert back to comma. I think I should change to a loop in case I end up with more than 2 sets of " but I am pressed for time because I have 20 more of these lovely files to figure out how to handle.

Then I copied the modified table and the non-modified into another temp table(yes, lots of temp tables, but I needed to be able to trace to make sure it worked). 

Ideally, I was hoping for a way to now take that final modified table and ETL it into another formatted/columnar defined table with 55 cols for this particular file and have SSIS(I'm 2012) do the dirty work like it does when a source is csv/txt/excel, but I couldn't find anything that could handle that. 

   ??? Does anyone know how to take a SQL table (NOT external file) as the source, which contains comma-delimited data, and send to a destination without using script logic? ????

Due to my time-crunch, what I ended up doing is then taking my newly created and cleaned up of commas between ", aka the 'pre-final CSV SQL TABLE, and exporting to a CSV destination file and then setting up a new Connection Mgr to read that new CSV as a Source and ETLing into the final 55 columns table. 

Any ideas? 

 

Brace yourself, it's rather crazy SQL down here...

I'd LOVE to avoid the manual coding because I'm going to be in a pickle if subsequent monthly files end up with more than 2 sets of " :( 

 

 ---converts those rows that contain embedded commas within 2 sets of 2 (for now) " to blanks. 
-- then the output from here will be joined up with those w/o " and then sent back to csv to reimport as corrected csv
-- yes, a roundabout way but best I could think of right now

 

truncate table [homeServer].[dbo].[temphold_SourceData1_varcharMaxRAW]

--created this to keep the original identity ID created from the initial import

 INSERT INTO [homeServer].[dbo].[temphold_SourceData1_varcharMaxRAW]

--replace the 2nd set with values from p5 as the source and there should be no more commas

select cast(p6.ID  as int) as OrigID
         ,cast(REPLACE(p6.newColumnRAW1,p6.set2,p6.NEWset2) as varchar(max)) as newColumnRAW
FROM (--replace the first set with values from the p5 set
      select p5.ID
              ,REPLACE(p5.ColumnRAW,p5.set1,p5.NEWset1) as newColumnRAW1
              ,p5.set2
              ,p5.NEWset2
      from (--chg comma to space if between the " in the orig data (p4) and join to data that contains 4 positionals containing " (p3)
               SELECT p4.ID
                          ,p4.ColumnRAW   
                          ,p3.pos1
                          ,p3.pos2
                          ,p3.pos3
                          ,p3.pos4  
                          ,substring(p4.columnraw, p3.pos1,((p3.pos2-p3.pos1) + 1)) as set1
                          ,substring(p4.columnraw, p3.pos3,((p3.pos4-p3.pos3) + 1)) as set2
                          ,REPLACE(substring(p4.columnraw, p3.pos1,((p3.pos2-p3.pos1) + 1)), ',',' ') as NEWset1  --changed here to space
                          ,REPLACE(substring(p4.columnraw, p3.pos3,((p3.pos4-p3.pos3) + 1)), ',',' ') as NEWset2  --changed here to space
              FROM temphold_SourceData1_DQData p4            

   -- join from orig by ID created by other DTSx RAW import to those with dq positions

               INNER JOIN (select pos1
                                          ,pos2
                                          ,case 
                                             when xpos3 < pos2
                                               then 0
                                                else xpos3
                                            end as pos3
                                ,case
                                   when xpos3 < pos2  
                                     then 0
                                       else charindex('"',ColumnRAW,(xpos3+1)) 
                                 end as pos4

                                ,columnRAW
                                ,ID
                          FROM (--pass2 of data - get position3 having "
                                     select  pos1
                                               ,pos2
                                               ,charindex('"',ColumnRAW,(pos2+1)) as xpos3
                                               ,columnRAW
                                               ,ID
                                     FROM (--pass1 of data - get positions 1 and 2 of those containg " (dbl quotes = dq)
                                               select ID,
                                                        cast(charindex('"',ColumnRAW) as int) as  pos1
                                                       ,cast(charindex('"',ColumnRAW,(charindex('"',ColumnRAW)+1)) as int) as pos2
                                                       ,columnraw
                                              from temphold_SourceData1_DQData
                    --testing here    where id IN (5,1537)
                                              ) p1
                                 ) p2
                         ) p3 

                 ON p4.ID = p3.ID
           ) p5  
     ) p6

 


Wednesday, May 24, 2017 - 10:08:03 AM - gregor Back To Top (56006)

srsly this helped me alot. Thanks man.


Friday, April 21, 2017 - 6:12:01 PM - Lydia Back To Top (55088)

You just saved me hours of work! Great explanation, thank you sir!!!


Friday, May 13, 2016 - 12:28:04 PM - Joe Lehane Back To Top (41484)

 How do you cope with commas within the fields which were double quoted?

 


Thursday, February 4, 2016 - 12:00:17 PM - Brad Back To Top (40596)

 Thanks for this!  It was very simple and worked very well.

 


Tuesday, August 25, 2015 - 1:54:28 PM - Mateen Siddiqui Back To Top (38539)

thanks that worked!


Friday, March 20, 2015 - 4:19:05 PM - Kris Back To Top (36614)

Thanks for educating the community with this Tip. I do appreciate your volunteer-ship.

My Questions are:

1. Text Qualifier accepts only qualifier or multiple?

 

2.  I have text file as below

 

10,"Kris",'Acct', "$45.00", $666,New York

 

How to handle such file using Text Qualifier?

 

Thanks

 

 


Friday, March 20, 2015 - 11:55:25 AM - Sancho De Jesus Back To Top (36611)

This post is very helpful. Thank you for posting.


Friday, March 20, 2015 - 8:26:53 AM - Greg Robidoux Back To Top (36606)

Hi Ajit,

there is not an easy way to do this since this process thinks that a double quote is a text qualifier.

You could try to add another process to strip out all double double quotes.  This could be done in the package or you could do outside of SSIS. 

There may be another way to do this, but I am not sure offhand.

-Greg


Friday, March 20, 2015 - 7:08:16 AM - ajit Back To Top (36604)
please check below i am not able to separate "" in bold
thanks please check and let me know 
 
 
"03",4186,2015-03-16 00:00:00,4,"VALFLOWCVA",3.000,42.00,"66        ","1290","FOR G.S.T ENDFACING VICE LINER","",,,,,,"FLOW CONTROL VALVE, MODEL NO.-IL 340 1,0.2 -10 BAR, SIZE=3/4"", MAKE=SCHRADER BELLOWS, FOR TUBE MILL - 02 CUTTING CO2",,"","NOS   ",,,"","N","",,"N",.0000,.00,"N","N",""

Thursday, September 4, 2014 - 8:43:46 PM - Alex Back To Top (34391)

I have the same issue; how do we get around this?


Wednesday, May 28, 2014 - 9:38:03 AM - Inam Back To Top (31973)

Thank you. your post solved my problem.


Wednesday, October 16, 2013 - 5:51:24 AM - Bhoj Back To Top (27164)

Good One !!


Tuesday, April 16, 2013 - 5:04:03 AM - Mikel Back To Top (23368)

Hi,

Is there a way to remove the double quote for those column with no data only.  We want to keep the double quote for those column that has data.


Thursday, April 4, 2013 - 11:34:02 AM - Driq Back To Top (23166)

 

Hehe..this is so simple,saves a lot of time. Thanx.


Monday, February 25, 2013 - 3:13:30 PM - Carol Back To Top (22410)

OK - I found the problem, and it's environment specific.  My development environment substitutes _x0022_ in the "TextQuallifier" property, and my Production Environment doesn't understand that...  Right click your .dtsx, view the source and look for TextQualifier.  If you port it over to your production environment, fix the TextQualifier and recompile, it finally works....

Thank you Jamie Thomson!

http://sqlblog.com/blogs/jamie_thomson/archive/2011/10/11/31040.aspx

 


Thursday, February 7, 2013 - 1:06:52 AM - Carol Back To Top (21960)

My SSIS package runs great interactively from Visual Studio.  However, after I deploy to SQL Studio 2008/R2 and schedule it the quotes are imported into the table.

Can anyone please help?

Thanks


Thursday, January 31, 2013 - 6:15:09 AM - Ashish Back To Top (21829)

 

How about quote and coma inside quote?

how parse folowing txt or csv file formate?

"12333","Ashish","Ashish , Patel 6"","24566"

1. 12333

2. Ashish

3. Ashish , Patel 6"

4. 24566


thanks


Thursday, January 3, 2013 - 4:50:33 PM - Suman Back To Top (21254)

Thanks a lot, this worked like a charm.

 

 

 


Friday, November 30, 2012 - 10:03:15 PM - Ben.Gu Back To Top (20681)

Thank you very much.

it worked.


Friday, November 16, 2012 - 4:58:06 PM - Joanie Back To Top (20381)

Thank you.  Worked very well!


Friday, August 3, 2012 - 11:06:49 AM - Prolay Back To Top (18911)

Thanks a lot , a little " makes a lot difference

 


Friday, June 15, 2012 - 9:14:01 AM - FunkyKoval Back To Top (18033)

Hi

 

I have similar problem, also I work with file with double quotes at begin and at the end of some columns in csv file

but unfortunately some lines are wrong, they start with double quot but do NOT end with it!

of course data flow task fail, how do you sugest to repair the before running data flow tusk to be the most effective

unfortunately files are quite big >200MB

 

Funky


Thursday, February 23, 2012 - 11:56:15 AM - Suresh Back To Top (16143)

 

Thanks Greg, It fixed my issue


Monday, February 20, 2012 - 9:43:19 AM - AmyNxDx Back To Top (16094)

Thanks! This simple fixed helped me TONS!!


Tuesday, May 3, 2011 - 8:27:23 PM - Adam Back To Top (13741)

Do'H!!!

Such a simple fix. I started manually pulling out the data and ran into a bunch of data conversion errors down the road.

Saved me a bunch of time. Thanks!


Wednesday, March 23, 2011 - 12:07:15 PM - Pete Back To Top (13289)

I have the same problem; how do we get around this!?


Wednesday, June 30, 2010 - 12:20:07 PM - jerryol Back To Top (5762)

Ha, ha.  When I google "undouble kirk haselden" I do get 4 hits.  Three of them point to this article, and one is junk.  Thanks anyway.


Monday, August 31, 2009 - 6:13:21 PM - MikeInNH Back To Top (3982)

 Hi William

Yeah, the tip published here doesn't work so good if the field separator (comma) is included in the quoted text.  However, I believe if you Google "undouble kirk haselden" you will get about 4 hits.  You probably know that Haselden is the "father of SSIS" - His solution is to use a script component to parse lines.  After some Googling you can probably get the code (in Visual Basic .NET I believe) for the component. 

 Hope that helps. 

Mike 


Monday, April 7, 2008 - 9:57:46 AM - william.weber Back To Top (837)

I realize this is a very old tip, but I've been searching endlessly for a solution to my problem.

How do you deal with embedded text qualifiers in SSIS? I know that DTS dealt with this stuff correctly and I can't for the life of me figure out how to make SSIS read the file correctly. For example:

"This is field1","This is ""field2"""

 In DTS if you selected " as your text qualifier it would correctly give you the following:

This is field1, This is "field2"

In SSIS it just breaks. Any help is greatly appreciated.















get free sql tips
agree to terms