By: Nat Sundar | Comments (9) | Related: More > Integration Services Development
Problem
SQL Server Integration Services (SSIS) does not have a default task to poll a folder to see if a specific file exists. There are third party controls that can assist us with this task, but not all organizations allow the use of third party controls in SSIS. In this tip, I will show how you can build an SSIS process to poll for a file.
Solution
Polling for a file is a common task used in ETL solutions. Polling means an application continuously checks for a change in the status of an external process by looking for a file whether it is an empty file or the file has actual data. Most ETL solutions need to poll a specific directory for a file before moving to the next step, but unfortunately SSIS does not have a default task to achieve this.
There are various scheduling tools available in the market to poll a file or a directory and once the file is available the scheduling tool can execute a SSIS package. However these scheduling tools are very expensive and not preferred by all organizations.
Another approach is to use third party controls for SSIS. However use of third party controls is not always permitted in organizations. In addition, the installation of these components requires manual intervention to update the SSIS Packages and activate the license.
Hence I have come up with an alternative solution.
Solution Overview
This solution will use only the built-in tasks available in SSIS. This solution will use a For Loop and a Script Task component to poll for a file. Until it finds the file in the given folder, the For Loop¯ will run again and again. The Script Task has been placed inside the For Loop and for every iteration of the For Loop the Script Task will be executed.
The script task will check for the files in the given folder. If found, the script task will return the name of the file in a message box.
In addition the script task will reset a flag for the For Loop, so the For Loop¯ terminates on the next iteration.
A delay can be introduced in the script task after every iteration, so the polling can wait for a reasonable period of time before checking again.
Detailed Solution
So that you can understand the basics and you will be able to change the solution as per your requirements, I have created two user variables "FolderLocation" and "IsFileExists". These variables and their datatypes can be seen in the below picture.
Package Structure
This sample package has a For Loop¯ which contains a Script Task. The below picture highlights the package at very high level.
The below picture highlights the "For Loop"¯ configuration.
We have to configure the Script Task to read the variable "FolderLocation" and to configure the variable "IsFileExists" to read/write. The below picture highlights the configuration in detail.
Now let's write some basic code inside the script task. The below picture represents the actual code for the script task. As you can see this code simply pops up a message box.
Warning
As we haven't set the package to terminate, it will run an infinite number of times. When you execute the package, it is expected that the message box will appear again and again. You may need to terminate the execution to stop the package. The below image represents the package execution in an infinite loop.
Terminate the package after single iteration
The package can be terminated by setting the flag "IsFileExists"¯ to true as shown in the below image.
When this runs, the package will complete in the first iteration as per the image below.
Let's modify the script task to poll a file in the given folder. To use appropriate ".Net assemblies"¯, we need to refer the library "system.io" in the namespaces section. This is highlighted in the image below.
The code will look for a file in the given directory. If found, it will be return the name of the file and the package will terminate. The actual code can be referenced in the below picture
Now let's create a file in the folder location as per the image below.
A breakpoint can be set on the code to debug and investigate. During the execution the break point will be hit and the value of the variable can be validated by placing the cursor over the variable. The below picture represents the debug procedure.
After successful execution, it will display the message box as shown below. (Note you can copy the code further on in the tip.)
Enhancing the Solution
It is possible that the data file may be delivered a little later than the expected, so the package should wait for some time before the next iteration. This can be done by introducing a delay in the script task. We need a new variable to store the value of the delay in milliseconds. The variable has been created and a value of 1000 has been provided. This can be seen in the picture below.
As this variable will be read by the script task, this needs to be referenced in the script task as shown below in the ReadOnlyVariables section.
The below represents the modified code to introduce a delay using "Thread.Sleep"¯.
public void Main() { string DataFilesLocation; string[] DataFiles; Int32 DelayTimer; DelayTimer = (Int32)Dts.Variables["User::DelayTimerInMs"].Value; DataFilesLocation = Dts.Variables["User::FolderLocation"].Value.ToString(); DataFiles = Directory.GetFiles(DataFilesLocation); if (DataFiles.Length > 0) { MessageBox.Show("Data File Name: " + DataFiles[0].ToString()); Dts.Variables.["User::IsFileExists"].Value = true; } else Thread.Sleep(DelayTimer); Dts.TaskResult = (int)ScriptResults.Success; }
The below image represents waiting after an iteration.
Once a file has been found in the folder, the script task identifies the file and displays the name in the message box.
Scope For Improvement
This solution can be extended further to stop the package execution by terminating the "For Loop"¯ after a set amount of time (say after an hour or so). This can be easily done by finding the difference between the current time and the start time of the package. The start time of the package can be derived using the system variable "StartTime".
Summary
It is always recommended to terminate a package after a period of time instead of having it run forever. This can be managed as mentioned in the scope for improvement section. In addition, a SQL Server Agent Job can be scheduled to run the package again after a specific time interval.
Next Steps
- Read more about the SSIS File Watcher here.
- Learn more about script task here.
- Learn about GetFiles function here.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips