Learn more about SQL Server tools

solving sql server problems for millions of dbas and developers since 2006
join MSSQLTips for free SQL Server tips

































Top SQL Server Tools






















Loop through Flat Files in SQL Server Integration Services

MSSQLTips author Tim Smith By:   |   Read Comments (14)   |   Related Tips: More > Integration Services Development
Problem

Several executives tell us that they will be collecting data from thousands CSV or text files and we should expect to see at least ten new files a day. Instead of trying to import each file one by one, how can we create an SSIS package that pulls the data from each file into SQL Server while being able to handle a large number of files and data?  Check out this tip to learn more.

Solution

Before we get address the solution for this tip, I want to be clear that this tip solely addresses CSV and text files, not Excel. Excel requires a similar process to this, which is not covered, and also assumes that the Excel file formats are identical (Excel allows users to change formats, often creating a headache with SSIS).  Click here and here to learn more about working with Excel.

Now to the solution, we can load CSV and text files easily in SQL Server Integration Services (SSIS) without requiring much time, energy and also in a way that's constructed to handle more data growth.  Let's jump right into the Business Intelligence Development Studio or Visual Studio to get started.  Here are the SSIS Packages I have created for this tip, feel free to use them as a point of reference as you read this tip (note that one adds a derived column to also import the file name, the other is generic and allows you to adjust as you need).

1. Start with a Foreach Loop Container as shown in the first image below. We will place a few tasks inside of this container momentarily. Also, we'll create two string variables, one called ArchiveFolder and the other CurrentFile as shown in the second image below. The CurrentFile variable should be empty, while the ArchiveFolder variable should have the location where we want our files to be moved after the data are imported.  To access the variables window navigate to View | Other Windows | Variables.

Step One: Foreach Loop container

Step One: Add two string variables

2. Next, we double click the For each Loop Container. On the Collection tab (the first image below), we'll enter the path where we are going to import the files from (in the Folder text box) - for instance, C:\OurFolder. In the Files text box, we'll enter the extension of the files (and select the "Fully qualified" radio button) - for instance, *.csv* would mean that we're importing CSV files. Next click we'll, click Variable Mappings tab (in the second image below), and insert the variable CurrentFile with an Index of 0 (should be default on the index).

Step Two: Collections Tab

Step Two: Variable Mappings Tab

3. Next, we'll insert three other tasks inside the For each Loop Container: a Data Flow Task, an Execute SQL Task, and a File System Task (as shown in the first image). All of these tasks should have the green arrows pointed to the next task as this will be our process as the looping begins. Note that the Execute SQL Task can go outside of the loop as well (meaning, that we'll manipulate the data after the import process as shown in the second image). While we can perform data manipulation inside of SSIS, depending on our experience, we can also manipulate it through code. The advantage of code is that it can handle changes better - for instance, executives telling us that they want the date formatted differently.

Step Three: Data Flow Task, Execute SQL Task, File System Task inside the Foreach Loop

Step Three: Data Flow Task, Execute SQL Task inside the Foreach Loop

4. Inside of the Data Flow Task, add a Flat File Source and an OLE DB Destination. For the Flat File Source, we are going to initially select one of our flat files and use that as a guideline to create flat file connection. Once the Flat File Connection is listed under the Connection Managers (at the bottom, in this example called "OurFlatFileConnection"), we will change the Connection String Expression to "CurrentFile" so that it points to the current file during the loop. For "OurMSSQLConnection", we'll enter our server and database name and inside of the OLE DB Destination, point the flow to the right table.

Step Four: Flat File Source to OLE DB Destination

Step Four: OurFlatFileConnection string is the variable CurrentFile

Step Four: Select the database

5. Depending on where we set our Execute SQL Task, we'll adjust its properties as well - in this case calling our stored procedure to clean up the data. For instance, if we have chemical data that only need to show the fourth decimal place, and our text or CSV files have seven decimals places, we can clean the data with a stored procedure to round to the nearest fourth decimal place (for instance: CAST(ROUND(chemicals,4) AS DECIMAL(7,4)). We could also have the stored procedure transfer data from a staging (or temporary) table into a main table to ensure that our data is good, as well as keep a receipt of bad data.

With SSIS and data imports, keeping receipts of bad data will also help us communicate with the data sources where bad data are originating and this may help them identify problems in their systems.

Step Five: Execute SQL Task for cleaning your data

6. The final step, we'll move the file to our archive folder. We do this by selecting the Destination Connection as the variable ArchiveFolder and setting the Source Connection as OurFlatFileConnection. Whatever the current file is during the loop process, that will be the file that is moved into the archive folder.

Step Six: File System Task
Next Steps


Last Update: 3/4/2013


About the author
MSSQLTips author Tim Smith
Tim Smith works as a database administrator and developer for SQLWatchmen. He also programs with Ruby on Rails.

View all my tips


print tip Print  
Become a paid author




Recommended For You








Learn more about SQL Server tools
Comments and Feedback:
Monday, March 04, 2013 - 12:43:56 PM - Sik Read The Tip

Awesome Explanation!


Wednesday, April 03, 2013 - 1:40:02 PM - Tim Sullivan Read The Tip

Sure would be nice if y'all had a link to print the tip & it's related tips in one fell swoop. I can only cut so much time out of my work day for learning. A printed version, paper or PDF, could ride hoem with me for later reading.

 

PS thank you for not using Captiva, it usually taked a half dozen attempts to get the code right :-)


Wednesday, April 03, 2013 - 3:31:32 PM - Greg Robidoux Read The Tip

@Tim Sullivan - there is a print option for all tips, but there is not a way to print multiple at one time.  If you look above the "Comments and Feedback" section there is an option that will all you to print a clean version of the tip.

Regards,
Greg


Wednesday, April 03, 2013 - 8:38:25 PM - Tim Sullivan Read The Tip

Thanx, I see it now!!


Friday, July 05, 2013 - 3:00:57 AM - Mpumelelo Read The Tip

I use a ForEachFolder Loop to loop through close to 15 folders. Each folder has up to 25 csv files. The files have similar names in each of the 15 folders. What I noticed was that during each iteration the foreach folder loop does not read all the data from a file. For example, I have a file named FACT_Address.csv with 2287 records. When i run the for each folder container only 889 are loaded and there are no errors whatsoever. But if I run the task individually, all the 2287 rows are loaded.

 

Has anyone ever faced a similar problem with the ForEachFolder Loop?


Wednesday, January 29, 2014 - 1:36:50 PM - Priya Read The Tip

Hello All,

I am struck at Property expression editor place. Could anyone please help where to find that window?

Thanks

Priya


Thursday, February 13, 2014 - 6:14:12 PM - RW Read The Tip

Like Priya, I'm stuck at the Property Expression editor place.  I'd appreciate is someone could explain further what that section is doing and how to get to the screens to do it.


Thursday, February 13, 2014 - 11:04:56 PM - RW Read The Tip

Figured it out.  It's in the properties of the flat file connection.  Select the ... next to the expressions property and you should be able to add the connectionstring property with the current file expression.


Friday, February 21, 2014 - 1:36:16 PM - dina Read The Tip

I created the same package as above in ssis 2012 and it works in ssdt perfect. I deployed and when I execute it succeeds but It's not moving the files or anything. When I read the report that comes out I get a warning from the foreachloop saying it's empty. But it's not because when I run it from SSDT it works and transfers the files.  Did I forget to deploy something I only have variables. Am I suppose to create paramters? Please help


Friday, March 07, 2014 - 12:53:36 PM - Tapan Read The Tip

I would like to know how to run this task periodically like weekly / monthly?

Also is it possible to write the file names to a different table and check if a particular file has been read or not before being imported?

 


Friday, March 14, 2014 - 8:45:38 AM - Tim Read The Tip

@dina  In testing on SSDT, make sure that it matches how it will run in production.  If files aren't moving and data aren't importing, that usually means that something doesn't match between the test environment and production.  Also, for the foreach loop, you want to make sure that a parameter exists (show above) which will hold the file currently being imported, so if you have no parameters in your package, see steps 1, 2, and 3.

@Tapan  It depends on your environment, how often you receive the files, and how often you'll need the data.  For instance, if you need the data weekly, then you want to do it at least weekly.  If you go to my GitHub account (link: https://github.com/tmmtsmith/SSISLoopWithFileName), it has a demo SSIS package that also grabs the file name, so yes, it's possible.


Wednesday, April 09, 2014 - 2:34:31 PM - Zahid Read The Tip

I am looking to do the following.

My SSIS package is already working with a hardcoded filename from the folder as a flat file source. The issue I am coming across is, there will be multiple files with the date as the filename. My SSIS package should only import the most recent file and transfer it over to the db. How can I achieve that?

 

My SO question is here: http://stackoverflow.com/questions/22969449/how-to-use-the-latest-file-in-a-folder-for-source


Thursday, April 10, 2014 - 3:12:33 PM - Tim Read The Tip

If you're calling your package in a job step, you create a job step before calling the package and set the step to use PowerShell and call this function, https://github.com/tmmtsmith/Powershell/blob/master/RemoveOlderItems.ps1, by passing in the file location of these files.  This function will remove the oldest files in the directory, and keep the most recent, so that you only import the most recent file.

 


Wednesday, April 16, 2014 - 9:00:24 PM - Richard Read The Tip

Thanks very much for putting this together. I have been putting off converting some old DTS packages that have been playing up since moving to SQL 2008 as I wasn't looking forward to converting the Active X scripts that do this task. I was able to get it running within an hour with the help of your article so no longer need Active X!

 

It was also my first play with SSIS 2012 and it is much better than SSIS 2008!



Post a Comment or Question

Keep it clean and stay on the subject or we may delete your comment.
Your email address is not published. Required fields are marked with an asterisk (*)

*Name   *Email Notify for updates

Signup for our newsletter


Comments
*Enter Code refresh code


 
Sponsor Information







Copyright (c) 2006-2014 Edgewood Solutions, LLC All rights reserved
privacy | disclaimer | copyright | advertise | about
authors | contribute | feedback | giveaways | user groups | community | events | first timer?
Some names and products listed are the registered trademarks of their respective owners.