How can I use file names and properties when loading files in SQL Server Integration Services (SSIS)? Is the only option writing VB or C# code? Are there any options directly available in SSIS?
I have heard several times that when loading data in SQL Server Integration Services (SSIS), a package designer needs to load not only the data itself, but also often times the name of the flat file source document that was used as the basis for the load. Some solutions offered included using a VB or C script to programmatically pull in a file name. Others have offered solutions such as completing some preprocessing of the file using PowerShell. Of course, you may be wondering what the need is? Logging and auditing is the answer. Specific user or legal requirements may request that the file name be logged and noted for some ETL processing. As with most things in SQL Server, there are actually several ways to get a job done, and, as we will walk through in this tip, there is a very easy way to get the file name!
We will use the Adventure Works databases as the basis for our examples. The 2014 versions of the regular and data warehouse databases are available on Codeplex: https://msftdbprodsamples.codeplex.com/releases/view/125550. Once you download and install the SQL Server databases, we will subsequently use SQL Server Data Tools for Business Intelligence (SSDT-BI) for Visual Studio 2013 to develop our SSIS package and SQL Server Management Studio to develop queries. You can download SSDT-BI from: http://www.microsoft.com/en-us/download/details.aspx?id=42313.
Getting and using the file name in SSIS
For this example, we are going to use a simple text file that contains a zip / postal code and city name. An example of one of the files is shown below.
We are going to load this data into a new table in the AdventureWorks database called dbo.ZipCodeTest. The table design appears in the next screen shot.
Next we will setup a simple SSIS package to load the example zip code files. First we must open SSDT-BI, and then create a new SSIS package. Next we will create two data sources for our load. As shown subsequently, the first source is to our AdventureWorks2014 database. The second source is to the flat files (text files) we will source for our source.
With the data sources created, we add two tasks to our package, as illustrated next:
- An execute SQL task to clear our table with each load; this task truncates the dbo.ZipCodeTest table
- A data flow task which moves the data from the flat files to the table; as shown in the second illustration below, the data flow sources from the flat files and puts the data in the ZipCodeTest table.
This package loads successfully with the following results.
Pretty simple? Of course, we now want to get the file name of the load. As mentioned, many folks would say to loop through the files using some sort of scripting. However it is much easier than that, but it is not that obvious! First we must either double click on the data flow task or click on the data flow tab. Next, as shown below, you must right mouse click on the flat file source and then select Show Advanced Editor.
On the advanced editor details screen, select the Component Properties tab as shown below. Next, as shown in step 2 below, in the FileNameColumnName property, create a name for the column which will hold the file name.
Finally, you will need to map the new FileNameColumnName to the appropriate field in the table.
Now looking at the results, we get the file name (along with the full path of the load file). One potential offshoot from this method is you may want to strip away the path info. You could potentially use a derived column to strip away the path information, but we will leave that functionality for another tip. However, in the next section, we will explore using the file name field as an input for a script component.
We can actually take this process one step further through the use of the Script component to now get the source files' last modified date and time. In the below screen shot, we add a derived column transformation to our data flow. This derived column, called FileDate, will be created to hold the date / time of each file.
A script transformation is then added to the data flow.
Now, we need to switch to the Input Columns window of the script transformation. Here we need to tell what columns we want to pass into the script and whether the column is ReadOnly or Read/Write. At a minimum, we need to pass in the FileNamePath as ReadOnly and the FileDate as ReadWrite. The FileDate column will get assigned a value for each row.
Last we need to edit our script. Microsoft provides us with some basic script template details within the script component. In order to interact with the file system, we need to import the Microsoft.VisualBasic.FileSystem and System.IO namespaces, as described in the below figure. Finally, we add two lines to the ProcessInputRow Sub to first get the FileNamePath (the path and file name of the file). Then we retrieve the LastWriteTime (last modified date and time) of the file and set it equal to the FileDate column.
Now when the package is executed, both the FileNamePath column and the FileDate columns are populated.
SSIS flat file data sources contain a little known property on the advanced editor that allow you to retrieve the path and file name of the flat file you are loading. Note that this property is only available on the flat file source (you will have to resort to scripting if you want, for example, an Excel file name). However, it can be used with the single file or multiple flat file sources. The FileNameColumnName property allows a SSIS developer to set the name for the FileName column in order to populate it for later use and mapping in a destination, such as a table. Furthermore, as shown in our example, we can also use it within a script component later in the data flow transformation.
- Review SSIS Tips - http://www.mssqltips.com/sql-server-tip-category/17/integration-services-development/
Last Update: 6/19/2015
About the author
View all my tips