Loop through Flat Files in SQL Server Integration Services

By:   |   Comments (29)   |   Related: More > Integration Services Development


Problem

Several executives tell us that they will be collecting data from thousands CSV or text files and we should expect to see at least ten new files a day. Instead of trying to import each file one by one, how can we create an SSIS package that pulls the data from each file into SQL Server while being able to handle a large number of files and data?  Check out this tip to learn more.

Solution

Before we get address the solution for this tip, I want to be clear that this tip solely addresses CSV and text files, not Excel. Excel requires a similar process to this, which is not covered, and also assumes that the Excel file formats are identical (Excel allows users to change formats, often creating a headache with SSIS).  Click here and here to learn more about working with Excel.

Now to the solution, we can load CSV and text files easily in SQL Server Integration Services (SSIS) without requiring much time, energy and also in a way that's constructed to handle more data growth.  Let's jump right into the Business Intelligence Development Studio or Visual Studio to get started.  Here are the SSIS Packages I have created for this tip, feel free to use them as a point of reference as you read this tip (note that one adds a derived column to also import the file name, the other is generic and allows you to adjust as you need).

1. Start with a Foreach Loop Container as shown in the first image below. We will place a few tasks inside of this container momentarily. Also, we'll create two string variables, one called ArchiveFolder and the other CurrentFile as shown in the second image below. The CurrentFile variable should be empty, while the ArchiveFolder variable should have the location where we want our files to be moved after the data are imported.  To access the variables window navigate to View | Other Windows | Variables.

Step One: Foreach Loop container

Step One: Add two string variables

2. Next, we double click the For each Loop Container. On the Collection tab (the first image below), we'll enter the path where we are going to import the files from (in the Folder text box) - for instance, C:\OurFolder. In the Files text box, we'll enter the extension of the files (and select the "Fully qualified" radio button) - for instance, *.csv* would mean that we're importing CSV files. Next click we'll, click Variable Mappings tab (in the second image below), and insert the variable CurrentFile with an Index of 0 (should be default on the index).

Step Two: Collections Tab

Step Two: Variable Mappings Tab

3. Next, we'll insert three other tasks inside the For each Loop Container: a Data Flow Task, an Execute SQL Task, and a File System Task (as shown in the first image). All of these tasks should have the green arrows pointed to the next task as this will be our process as the looping begins. Note that the Execute SQL Task can go outside of the loop as well (meaning, that we'll manipulate the data after the import process as shown in the second image). While we can perform data manipulation inside of SSIS, depending on our experience, we can also manipulate it through code. The advantage of code is that it can handle changes better - for instance, executives telling us that they want the date formatted differently.

Step Three: Data Flow Task, Execute SQL Task, File System Task inside the Foreach Loop

Step Three: Data Flow Task, Execute SQL Task inside the Foreach Loop

4. Inside of the Data Flow Task, add a Flat File Source and an OLE DB Destination. For the Flat File Source, we are going to initially select one of our flat files and use that as a guideline to create flat file connection. Once the Flat File Connection is listed under the Connection Managers (at the bottom, in this example called "OurFlatFileConnection"), we will change the Connection String Expression to "CurrentFile" so that it points to the current file during the loop. For "OurMSSQLConnection", we'll enter our server and database name and inside of the OLE DB Destination, point the flow to the right table.

Step Four: Flat File Source to OLE DB Destination

Step Four: OurFlatFileConnection string is the variable CurrentFile

Step Four: Select the database

5. Depending on where we set our Execute SQL Task, we'll adjust its properties as well - in this case calling our stored procedure to clean up the data. For instance, if we have chemical data that only need to show the fourth decimal place, and our text or CSV files have seven decimals places, we can clean the data with a stored procedure to round to the nearest fourth decimal place (for instance: CAST(ROUND(chemicals,4) AS DECIMAL(7,4)). We could also have the stored procedure transfer data from a staging (or temporary) table into a main table to ensure that our data is good, as well as keep a receipt of bad data.

With SSIS and data imports, keeping receipts of bad data will also help us communicate with the data sources where bad data are originating and this may help them identify problems in their systems.

Step Five: Execute SQL Task for cleaning your data

6. The final step, we'll move the file to our archive folder. We do this by selecting the Destination Connection as the variable ArchiveFolder and setting the Source Connection as OurFlatFileConnection. Whatever the current file is during the loop process, that will be the file that is moved into the archive folder.

Step Six: File System Task
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Tim Smith Tim Smith works as a DBA and developer and also teaches Automating ETL on Udemy.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Tuesday, March 23, 2021 - 5:08:32 PM - R. Alex Back To Top (88453)
It took me some time to figure it out at 100% but once I got it, I've been able to do this and make my own enhanced version, multiple loops at once, and many more, thanks a lot

Wednesday, February 24, 2021 - 11:00:09 AM - Robert Back To Top (88293)
If you have any problems getting the variables show up on the Flat File Connection manager, ensure that this is scoped to the package and not the project. Project connection will show params only

Thursday, May 30, 2019 - 2:19:44 PM - Todd Back To Top (81264)

This helped a lot.  I also got stuck trying to update the connection string to @[User::CurrentFile], but I finally found it.

Right-click the flat file connection and select properties

Don't click the button "..." beside connection string

Instead, scroll down to Expressions and click ...

It will open the window that is shown in step 4

Just make the window look like the pictures above.

Hopefully this helps.

Todd


Wednesday, February 7, 2018 - 3:35:44 PM - Rob Back To Top (75131)

 So on #4.. I am using a fixed width file. (does that matter?) When I try to change the connection to the variable I cannot find a way to pull up the expression window. I just get the connection manager. The is no property extension editor. Sorry I am pretty new to SSIS using 2008. Thanks

 


Sunday, November 5, 2017 - 6:55:23 PM - Tim Back To Top (69260)

@Graham Gilvar

 

The execute sql task is where you would perform further operations - such as data validation, if necessary.  If you dont't need to do any of this, you can remove it and let it go to the next step (move file).


Sunday, November 5, 2017 - 6:48:19 PM - Tim Back To Top (69258)

@Alex & Jon C:

It depends on the version you're using.  Newer versions of SSIS do not have this option on the properties.  Once a connection is set up to a flat file, the flat file will appear at the the bottom of the screen and the option parameterize has the property ConnectionString, Use existing parameter (which will need to be created first), which the foreach loop will use.

The above images show older version of SSIS, which make it easier to edit the properties in the task itself.


Saturday, November 4, 2017 - 11:31:08 PM - Alex Back To Top (69221)

good tutorial, but like 3 or 4 others here , step 4 is not clear, and can not proceeed. which is a shame 

 


Thursday, September 28, 2017 - 8:52:30 AM - Tim Back To Top (66681)

@BIlal

I would either use an SSIS package with multiple data flows that accounts for each file type and an accompanying table, or I would use an auto-import process that automatically creates a table based on a file's format.  The latter would only be useful if these types of differing files weren't consistently imported (ie: a one time import).  If you regurlarly import files with 2, 5 and 20 columns, then a process which handles those files repetitively is a better choice in my opinion.


Thursday, September 28, 2017 - 2:30:58 AM - BIlal Back To Top (66671)

 Hi Tim Smith,

                   What happens when the columns in each files are different, lets say file1 has 2 columns,file2 has 5 columns,file3 has 20 columns,and we want to create differenect table for each file. what will we do in this case? Thanks in advance.

 


Thursday, August 24, 2017 - 6:20:05 PM - Jon C Back To Top (65257)

 Hello, I'm afraid after Step 4, I don't see how you access the Property Expression Editor. I have been searching online for some time now to no avail. Thank you in advance for your help.

- Jon

 


Wednesday, July 15, 2015 - 9:01:39 AM - mahesh Back To Top (38213)

Great tutorial!!

 

Thank you for posting this tutorial, it helped me a lot to understand how to use different comonents in SSIS.

 

 


Tuesday, June 23, 2015 - 4:30:05 PM - Graham Gilvar Back To Top (38000)

Great job! This tutorial has done wonders for me, but I am stuck. In the Execute SQL Task Editor (just below step 5), what am I supposed to insert for the SQL Statement? Where yours says EXEC sp_OurSQL_Task. What is that code supposed to do? I have nothing written there and when I try to run the program it comes back with a validation error at the Execute SQL task, so I'm thinking that's the problem.


Friday, May 23, 2014 - 2:51:18 PM - Tim Back To Top (30909)

@Richard and @Patrick Awesome to hear.

@Kevin This specific SSIS approach won't perform that; I wrote an article for MSSQLTips showing how to do what you mentioned in PowerShell, at this link.


Friday, May 23, 2014 - 11:36:25 AM - Kevin Back To Top (30906)

For this particular example, do all of the Flat Files have to be of the same format?

And are all of the files being imported into the same destination table?

What if I have Flat Files of varying formats/number of columns, and want each to be imported into it's own table.

For instance an Accounts file, Customer file, Contacts file, etc ....and want a package that will iterate through all of them and Import them into their own tables such as dbo.Accounts, dbo.Customer, dbo.Contacts ...etc ...?


Thursday, April 24, 2014 - 9:05:47 AM - Patrick Conley Back To Top (30498)

Absolutely Amazing! Was tasked with re-creating someones 15 year old Access 2003 front end, SQL 2005 backend app and was only provided 1 text document and 2 Excel workbooks for all of their data to recreate. Most went smoothly on the import until the 200,000 line table import where SQL inherent SSIS package would fail consistently even with the source file broken down into several small chunks. This setup and documentation saved me about 6 hours of manual error filtering.

 

Thank you!


Wednesday, April 16, 2014 - 9:00:24 PM - Richard Back To Top (30085)

Thanks very much for putting this together. I have been putting off converting some old DTS packages that have been playing up since moving to SQL 2008 as I wasn't looking forward to converting the Active X scripts that do this task. I was able to get it running within an hour with the help of your article so no longer need Active X!

 

It was also my first play with SSIS 2012 and it is much better than SSIS 2008!


Thursday, April 10, 2014 - 3:12:33 PM - Tim Back To Top (30037)

If you're calling your package in a job step, you create a job step before calling the package and set the step to use PowerShell and call this function, https://github.com/tmmtsmith/Powershell/blob/master/RemoveOlderItems.ps1, by passing in the file location of these files.  This function will remove the oldest files in the directory, and keep the most recent, so that you only import the most recent file.

 


Wednesday, April 9, 2014 - 2:34:31 PM - Zahid Back To Top (30024)

I am looking to do the following.

My SSIS package is already working with a hardcoded filename from the folder as a flat file source. The issue I am coming across is, there will be multiple files with the date as the filename. My SSIS package should only import the most recent file and transfer it over to the db. How can I achieve that?

 

My SO question is here: http://stackoverflow.com/questions/22969449/how-to-use-the-latest-file-in-a-folder-for-source


Friday, March 14, 2014 - 8:45:38 AM - Tim Back To Top (29757)

@dina  In testing on SSDT, make sure that it matches how it will run in production.  If files aren't moving and data aren't importing, that usually means that something doesn't match between the test environment and production.  Also, for the foreach loop, you want to make sure that a parameter exists (show above) which will hold the file currently being imported, so if you have no parameters in your package, see steps 1, 2, and 3.

@Tapan  It depends on your environment, how often you receive the files, and how often you'll need the data.  For instance, if you need the data weekly, then you want to do it at least weekly.  If you go to my GitHub account (link: https://github.com/tmmtsmith/SSISLoopWithFileName), it has a demo SSIS package that also grabs the file name, so yes, it's possible.


Friday, March 7, 2014 - 12:53:36 PM - Tapan Back To Top (29681)

I would like to know how to run this task periodically like weekly / monthly?

Also is it possible to write the file names to a different table and check if a particular file has been read or not before being imported?

 


Friday, February 21, 2014 - 1:36:16 PM - dina Back To Top (29533)

I created the same package as above in ssis 2012 and it works in ssdt perfect. I deployed and when I execute it succeeds but It's not moving the files or anything. When I read the report that comes out I get a warning from the foreachloop saying it's empty. But it's not because when I run it from SSDT it works and transfers the files.  Did I forget to deploy something I only have variables. Am I suppose to create paramters? Please help


Thursday, February 13, 2014 - 11:04:56 PM - RW Back To Top (29439)

Figured it out.  It's in the properties of the flat file connection.  Select the ... next to the expressions property and you should be able to add the connectionstring property with the current file expression.


Thursday, February 13, 2014 - 6:14:12 PM - RW Back To Top (29436)

Like Priya, I'm stuck at the Property Expression editor place.  I'd appreciate is someone could explain further what that section is doing and how to get to the screens to do it.


Wednesday, January 29, 2014 - 1:36:50 PM - Priya Back To Top (29273)

Hello All,

I am struck at Property expression editor place. Could anyone please help where to find that window?

Thanks

Priya


Friday, July 5, 2013 - 3:00:57 AM - Mpumelelo Back To Top (25718)

I use a ForEachFolder Loop to loop through close to 15 folders. Each folder has up to 25 csv files. The files have similar names in each of the 15 folders. What I noticed was that during each iteration the foreach folder loop does not read all the data from a file. For example, I have a file named FACT_Address.csv with 2287 records. When i run the for each folder container only 889 are loaded and there are no errors whatsoever. But if I run the task individually, all the 2287 rows are loaded.

 

Has anyone ever faced a similar problem with the ForEachFolder Loop?


Wednesday, April 3, 2013 - 8:38:25 PM - Tim Sullivan Back To Top (23148)

Thanx, I see it now!!


Wednesday, April 3, 2013 - 3:31:32 PM - Greg Robidoux Back To Top (23147)

@Tim Sullivan - there is a print option for all tips, but there is not a way to print multiple at one time.  If you look above the "Comments and Feedback" section there is an option that will all you to print a clean version of the tip.

Regards,
Greg


Wednesday, April 3, 2013 - 1:40:02 PM - Tim Sullivan Back To Top (23144)

Sure would be nice if y'all had a link to print the tip & it's related tips in one fell swoop. I can only cut so much time out of my work day for learning. A printed version, paper or PDF, could ride hoem with me for later reading.

 

PS thank you for not using Captiva, it usually taked a half dozen attempts to get the code right :-)


Monday, March 4, 2013 - 12:43:56 PM - Sik Back To Top (22553)

Awesome Explanation!















get free sql tips
agree to terms