Learn more about SQL Server tools

mssqltips logo
giveaway
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Parallel execution of child SQL Server Integration Services Packages using For Loop Containers


By:   |   Read Comments (43)   |   Related Tips: More > Integration Services Development

Problem

It may happen that you need to execute a child SQL Server Integration Services (SSIS) package many times like in the case of processing a large number of flat files from a directory. You can execute a child package in a sequential way using a Foreach Loop Container, however this can take a very long time. In such scenarios, running SSIS packages in parallel comes into mind, but how can this be done?  Check out this tip to learn more.

Solution

There are third party SQL Server Integration Services (SSIS) components which allow parallel execution of child packages, unfortunately these components do not work as expected every time and of course, most of these components are not free.

Therefore, in this tip we will demonstrate an alternate approach for executing child SSIS packages in parallel without the use of third party components. We will be using SSIS 2012 in Project deployment mode for this tip.

Retrieving a List of Files to Process in SQL Server Integration Services

We will use a Script Task to get a list of files to process and store this data in a Queue collection which will be further stored in the [User::DirectoryContent] variable of type Object. 

For the purpose of passing the folder location for the files, we use a project parameter [$Project::SourceFolder].

Get Directory Content Sript Task

Get Directory Content Sript Task Properties

The file retrieval is very easy using the GetFiles method of the Directory class which returns the list as a string array which we pass to the Queue<string> constructor.

var files = System.IO.Directory.GetFiles(Dts.Variables["$Project::SourceFolder"].Value.ToString(), "*.txt", SearchOption.TopDirectoryOnly);

System.Collections.Generic.Queue<string> filesQueue = new System.Collections.Generic.Queue<string>(files);

Dts.Variables["User::DirectoryContent"].Value = filesQueue;

Preparing the For Loop Container for Parallel Processing in SSIS

Once we have the file list stored in a queue, we can start getting the files out of the queue and process them. As mentioned at the beginning of this tip, the parallel processing will be done by multiple For Loop Containers. Basically we design one and copy it multiple times depending on the maximum parallelization level we want to achieve. Also we setup a project level parameter [$Project::MaximumParallelism] to allow us to limit the level of parallelization at runtime whenever necessary. The basic design of the For Loop Container will be based on the image below.

Parallel Task For Loop Container

After adding the For Loop Container to the design surface, we have to define a set of variables within the Scope of the ParallelTask.

Parallel Task variables

The [User::DoWork] variable will control when the ParallelTask For Loop Container should exit its loop cycle. Its initial value has to be set to True to allow the code to enter the loop.

The second variable [User::FileToProcess] will store the file to be processed by the child package.

The third variable [User::TaskNumber] identifies the individual instance of the For Loop Container. Its value will start with 1 and will be increased by one for each copy of the For Loop Container. This will allow us control the maximum parallelism on the fly by using the above mentioned project level parameter [$Project::MaximumParallelism]. For example we have 16 copies of the For Loop Container for a maximum of 16 parallel tasks, but the MaximumParallelism parameter will allow us to limit the number of parallel tasks at the time of package execution.

Once the variables are set up, we can setup the ParallelTask For Loop Container.

Parallel Task For Loop Settings

The only thing we need to setup in the properties of the For Loop Container is the EvalExpression. This will tell the For Loop Container when to end or even whether this loop should start at all, based on the [$Project::MaximumParallelism] project parameter setting. We set the EvalExpression as below.

@[User::DoWork] && (@[$Project::MaximumParallelism] < 1 || @[User::TaskNumber] <= @[$Project::MaximumParallelism])

The above expression means that when [User::DoWork] is false (there will be nothing to process) then the For Loop will end. Also the For Loop will not even start if [User::TaskNumber] is greater than the [$Project::MaximumParallelism]. In case the [$Project::MaximumPrallelism] is less than one, there will be no restriction on the number of parallel tasks and all copies of the For Loop Container will run.

Of course we need to take into account the number of logical processors and the MaxConcurentExecutables property setting on the Package level. We cannot achieve greater parallelism than allowed by this property.

Retrieving the File to be Processed by Child SSIS Package

The next step after configuring the ParallelTask For Loop Container is retrieval of the file to be processed in a particular loop. This is the most critical part as one file can be retrieved by only one For Loop Container and the retrieval cannot collide - it must be thread safe as each For Loop Container can be operated by a different thread. For this purpose we will use the Script Task again. We will pass the [User::DirectoryContent] as a ReadOnly variable and [User::DoWork] as well as [User::FileToProcess] as ReadWrite variables.

Get File To Process Properties

The [User::DirectoryContent] variable contains the Queue with filenames, but because the variables are passed as an object type into the script, we will cast it back to the Queue<string> type.

Once we have the queue with files, we need to do two things. First, we have to determine if there are items in the queue or not. If yes, then we de-queue a file to be processed and set the [User::DoWork] variable to true. In the other case we will set the [User::DoWork] variable to false which will cause the For Loop Container to end.

To ensure that the de-queuing is done by a single thread and also that no other thread will de-queue an item while we are checking the number of items in the queue, we have to execute the checking and de-queuing inside a critical section. We create the critical section using the lock C# statement which requires global variables which are shared among all threads. The lock statement will ensure that no thread will enter the critical section while another thread is inside the critical section. This ensures a secure way to check the number of items in the queue as well as proper de-queuing.

//Get the queue with files to process
System.Collections.Generic.Queue<string> filesQueue = (System.Collections.Generic.Queue<string>)Dts.Variables["User::DirectoryContent"].Value;

/*
    * For the Dqueuing to be thread safe, we need to ensure that multiple concurent
    * threads will not dequeue at the smae time.
    * This we ensure by dequeuing files inside the critical section using the lock
    * statement
    */
lock (filesQueue)
{
    //If number of files in queue is greater thatn 0 then dequeue file for processing
    //and set DoWork to true. Otherwise set DoWork to false as there is nothing to process
    if (filesQueue.Count > 0)
    {
        Dts.Variables["User::FileToProcess"].Value = filesQueue.Dequeue();
        Dts.Variables["User::DoWork"].Value = true;
    }
    else
    {
        Dts.Variables["User::DoWork"].Value = false;
    }
}

Executing the Child SQL Server Integration Services Package

After a file is retrieved from the queue, we can execute a child package and pass the file. To ensure that the child package is executed only if there was something retrieved from the queue, a precedence constraint with the expression has to be used.

Precedence Constraint

If the [User::DoWork] variable is true, then we execute the package. In our sample solution we will pass the [User::FileToProcess] as well as the [User::TaskNumber] into the child package. The TaskNumber will allow us to see which For Loop Container the child package was executed by.

Precedence Constraint

Parallel Execution of the Child SSIS Packages

After successful setup of the ParallelTask For Loop Container and all the child tasks, we can parallelize it by simply copying it inside a single Sequence Container. Again, as mentioned above, the maximum degree of parallelism is defined by the number of copies of this For Loop Container. Also we set an appropriate value of the [User::TaskNumber] in each copy of the container by increasing it by one. The image below shows a sample with four copies of the For Loop Container, with a maximum degree of parallelism of 4.

Parallel Tasks

Testing the Parallel Execution

For the purpose of parallel execution testing, we generated a reasonable amount of files inside a directory specified by the [$Project::SourceFolder] parameter. The parallel loops will retrieve paths to the files one by one and pass those to the child package. The child package will read content of each single file and store it with the file name and the TaskNumber to a table in a SQL Server database.  For the sample child package details, please refer to the sample project in the Next Steps section. Once we execute the master package, we will receive a result similar to the one in the image below.

Sample Execution Result

As we can see, some loops were executed multiple times prior to others. Also due to the fact that the RowID in the sample table is an identity we can also see the order of execution.

Next Steps


Last Update:






About the author
MSSQLTips author Pavel Pawlowski Pavel Pawlowski is Lead BI Specialist in Tieto with focus on Microsoft SQL Server and Microsoft BI platform.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Monday, May 02, 2016 - 6:07:36 AM - SYED SHERFUDEEN Back To Top

Dear Pavel,

 I am reading data from Flat File and each record size is 800 kb data.

In DFT, I have Flat File source, Derived Column, Row Count and OLEDB Destination,  there is not much transformation.

Even though it's throwing same error message. 

Kindly suggest me how to decrease the degree of parallelism based on my existing design.

Thanks in Advance

 

 


Thursday, April 28, 2016 - 9:14:34 AM - Pavel Pawlowski Back To Top

You are facing a memory pressure,

 

The fix depends what you are doing in the packages and on the desing of the package(s). Solution may be decrease the degree of parallelism or re-design the data flows (in terms of design or buffer allocations).

 


Thursday, April 28, 2016 - 5:28:28 AM - SYED SHERFUDEEN Back To Top

 Dear Pavel,

After reaching  170th flat file from source I am getting the following warning message and error message. Please provide your valuable reply .

 

 Warning: 0x80019002 at SEQC_FILSVR_STGSVR: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED.  The Execution method succeeded, but the number of errors raised (1) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.

Warning: 0x80019002 at SOU_TO_STG_WIFI: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED.  The Execution method succeeded, but the number of errors raised (1) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.

Task failed: Execute Package Task:

Error: 0xC0047012 at SRC_To_STG: A buffer failed while allocating 10484544 bytes.

Error: 0xC0047011 at SRC_To_STG: The system reports 71 percent memory load. There are 17080553472 bytes of physical memory with 4786368512 bytes free. There are 4294836224 bytes of virtual memory with 199282688 bytes free. The paging file has 34159226880 bytes with 20480581632 bytes free

 


Wednesday, April 27, 2016 - 9:29:21 AM - SYED SHERFUDEEN Back To Top

Thanks for your valuable code.

I have implemented your approach and I am gettting following error message. Please advice to fix this issue....... 

 Error: 0xC0047012 at SRC_To_STG: A buffer failed while allocating 10484544 bytes.

Error: 0xC0047011 at SRC_To_STG: The system reports 71 percent memory load. There are 17080553472 bytes of physical memory with 4786368512 bytes free. There are 4294836224 bytes of virtual memory with 199282688 bytes free. The paging file has 34159226880 bytes with 20480581632 bytes free.


Tuesday, September 29, 2015 - 10:03:20 AM - Gopi Kothandapani Back To Top

Hi Pavel,

I am using SSIS 2012 and Visual Studio 2010 version. Do I need to add any DLL or reference for this casting statement in the Script task.

1). Due to this issue, I have modified my Script task by using Oledb command to read Object variable and  successfully adding into the Queue using 'Enqueue' method. In my example, there are two items 2 and 3.

The Problem is that when I execute the package,  The first time it return value 2 and it shows Queue count as 2. But in the next run in the  For loop, the Queue reuturn no value and says Queue count as 0 and exit the loop as Dowork set as false. I am expecting the next run it suppose to return value 3.

I got feeling that the Queue is getting re-initialise (everytime) again in the second run. But in that case, it would be indefinite loop and should not come out of the loop after two iterations.

I am testing this functionality through only one For Loop. For your better understanding, I have copied the code here. Could you please check and see if you can help me out.

            OleDbDataAdapter oleDA = new OleDbDataAdapter(); 

            DataTable dt = new DataTable(); 

            DataColumn col = null; 

            DataRow row = null;           

            QueueReportIDQueue = new Queue();

            //Valid_Report_Lists is Object Variable which has two values (2,3)

            oleDA.Fill(dt, Dts.Variables["Valid_Report_Lists"].Value);

 

            foreach (DataRow row_ in dt.Rows)

            {  

                row = row_;

                foreach (DataColumn col_ in dt.Columns)

                {

                    col = col_;

                    ReportIDQueue.Enqueue(row[col.Ordinal].ToString());

                }

            }

               lock (ReportIDQueue)

                {

                    if (ReportIDQueue.Count > 0)

                    {

                        Dts.Variables["ID"].Value = Int32.Parse(ReportIDQueue.Dequeue());                    

                        Dts.Variables["DoProcess"].Value = true;

                    }

                    else

                    {

                        Dts.Variables["DoProcess"].Value = false;

                    }

                

                }

           

Dts.TaskResult = (int)ScriptResults.Success;      

 

Regards,

Gopi Kothandapani

 

 


Tuesday, September 29, 2015 - 1:57:58 AM - Pavel Pawlowski Back To Top

Hi Gopi,

 

what version of SQL Server (SSIS) you are using? On >= 2012 this shouled work without issues.


Monday, September 28, 2015 - 5:56:02 AM - Gopi Kothandapani Back To Top

Hi Pavel,

I am glad to see your post on the Parallel Processing in SSIS. I am building my solution based on your post and am facing one issue which prevent me to proceed further. Even though this has been already discussed in this thread, want to check with you on any workarounnsd.

My Package Flow is, I have For Each Loop container which has collection of items and store them in the Object type. Inisde For Each loop, I am having Script Task which assign the Object type variable to the Queue Type like below,

Queue<string> ReportIDQueue = (Queue<string>)Dts.Variables["Valid_Report_Lists"

].Value;

 

The package is failing with the below error,

"Instances of types that represent COM components cannot be cast to types that do not represent COM components; however they can be cast to interfaces as long as the underlying COM component supports QueryInterface calls for the IID of the interface."

As per your earlier comments, can acheive it through Script Components. But I need to take care this Queue assignment outside Data Flow Tasks. 

Do you think whether do we have any other workaround to acheive this through Script Tasks.

Thanks for your help.

Regards,

Gopi Kothandapani

 

 


Monday, June 15, 2015 - 5:25:14 PM - Pavel Pawlowski Back To Top

Hi Andew,

hard to say exactly what is causing your async_network_io without seeing details but in most cases the async_network_io means that something is consuming the data from the query much slower than SQL Server is capable to deliver them from the result set and so it waits for the responses from client.

In SSIS this can be caused by the fact that the SQL Server is quite quick but you have some hheavy ETL work inside the package, so the data flow buffer processing is much slower than the sql server is capable to deliver the rows and backpressure occurs. At this time the async_network_io waits will increase.

As mentioned above this is one of the most common scenarios, but of course the source can be also somewhere else.


Monday, June 15, 2015 - 1:49:52 PM - Andrew Back To Top

Thanks for the great article. I have a question about the parallelism. When i run this, each parallel task containter is calling a package that loads data from one table into another (staging into fact by year). I'm seeing a lot of async_network_io waits while the project is running. The source data, destination table and package are all on the same server. Any settings I migt be missing? Do I need to adjust max executables or max degree of parallelism? This is a production server with lots of RAM and 24 logical processors. I have implemented it the same as your example with only 4 parallel tasks. Thanks for the help.  


Saturday, April 11, 2015 - 2:14:46 AM - Raj Kumar Back To Top

Hi Pavel,

I have a situation like, I have two types of child packages

Child packages Type 1 : these are 700 packages.

Child packages Type 2 : there are 1000 packages.

All child packages use the same connections.(Source and Target database connections.)

Now I want to execute all the child packages type 1 to be executed parallaly and once Type 1 packages executed then Type 2 packages should be executed parallaly through a Master package. (Parallel execution shuould be as much as fast)

I want this Master package should be transactional based if even single child package fails then everything should be rollback.

could you please provide hints to create the Master packages.

Thank you in advance.

--Raj


Sunday, March 22, 2015 - 4:34:31 AM - Pavel Pawlowski Back To Top

Hi Ivan,

 

what kind of problems you are experiencing with the project level connection? In case of reading flat files and executing the same package multiple times, you have to use a package level connection otherwise the connecting string would be changed with each execution of the pacakge. But e.g. for the destination database if ti si same for all packages, then it is no problem to use a projedt level connection.


Friday, March 20, 2015 - 5:23:50 PM - Ivan Back To Top

Hi Pavel, thanks for the great tips!

I seem to be having a problem when I use a Project level database connection for the child package. It works fine if its a package database connection. Did you have this problem too?

Thanks,

Ivan.


Wednesday, March 11, 2015 - 3:41:52 PM - Pavel Pawlowski Back To Top

Related to the TaskNumber - You set it manually for each copy of the FoEachLoop container. It serves as container number to identify it for determining the number of parallel executions.


Wednesday, March 11, 2015 - 2:40:11 PM - Baxter Madu Back To Top

Answered my quesiton regarding TaskNumber: What I didn't realize is that it is a variable, along with DoWork and FileToProcess, that are local in scope to the ParallelTask.  I was not aware this was possible.  Thanks - this works very well.


Tuesday, March 10, 2015 - 5:00:44 PM - Baxter Madu Back To Top

My question revolves around TaskNumber - where exactly does it get incremented? The article indicates this happens in the for loops, but I don't see it happening anywhere in the instructions nor in the demo project.

 


Tuesday, October 14, 2014 - 9:14:10 AM - Pavel Pawlowski Back To Top

Hi Kumar,

here are the answers:

1)

The values are stored into a filesQueue which is then stored into a package Global variable "DirectoryContent". Then in each loop  in the script task the global variable "DirectoryContent" is casted back to the Queue object and assigned intot he filesQueue variable. This means all the loops works with the same instance of the queue and are de-queuing data from it. Because of this there is the lock - critical section which ensures that only one thread is de-queuing values from the variable.

2)

If you use the OLE DB data table the problem is that this is implemente through the COM components and you cannot easily access values in such variable in the script.

To workaround this I suggest you to use a Script Component (Destination) in which you will process the input row and store those in the queu.

This means that in the Script Component you define a Struct with the values you need to store and then store that struct in the queue. Then inside the loops you dequeue the structs back and process in parallel. Of course you have to define the structure also in each of the scrip task. (Instead of struct you can define an Interface inside the Script Component in the data flow and also create class which implements this interface. Then store the objects in the queue. When de-queuing the objects in the looop Script Tasks, you repeat the Interface declaration and get the object as the interface implementation.

So the Script component will have code similar to this:

interface IMyData
{
    int Field1 { get; }
    int Field2 { get; }
}
class MyData : IMyData
{
    public int Field1 { get; set; }
    public int Field2 { get; set; }
}
System.Collections.Generic.Queue dataQueue;

public override void PreExecute()
{
    base.PreExecute();
    dataQueue = new System.Collections.Generic.Queue();
}
    
public override void PostExecute()
{
    base.PostExecute();
    //Assign queu to global variable
    this.Variables.Data = dataQueue;
}
   
   
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    MyData d = new MyData();
    d.Field1 = Row.Field1;
    d.Field2 = Row.Field1;
    dataQueue.Enqueue(d);
}

And the Script Task something similar:

 interface IMyData
{
    int Field1 { get; }
    int Field2 { get; }
}
       
public void Main()
{
 System.Collections.Generic.Queue dataQueue = (System.Collections.Generic.Queue)Dts.Variables["User::Data"].Value;
    lock (dataQueue)
    {
        IMyData d = dataQueue.Dequeue();
        //follows data processing or returning it in a variable to parent package.
    }
 Dts.TaskResult = (int)ScriptResults.Success;
}

 Of course if you are passing a simiple string or integer value, then you do not need to implement the class and interface but simply create a queue of the base data type in the example above.

eg: System.Collections.Generic.Queue


Tuesday, October 14, 2014 - 4:26:06 AM - Kumar Back To Top

Hi Pavel, It would be great if you can clarify a few doubts of mine. 
1) System.Collections.Generic.Queue filesQueue =(System.Collections.Generic.Queue)Dts.Variables["User::DirectoryContent"].Value; Does this statement associate(link) the queue with the ssis object (changes done to one is reflected in other) or It is used to simply assign the values stored in it to a queue?
Since, this statement is executed every time inside the script task as it loops. How does it not result in an infinite loop since queue is being re-initialized every time.
2) I have tried implementing the same but got this error: "Instances of types that represent COM components cannot be cast to types that do not represent COM components; however they can be cast to interfaces as long as the underlying COM component supports QueryInterface calls for the IID of the interface."
It has been discussed before but I cannot understand your solution to implement it using data flow. I modified my script to load the queue using  oledb data table( which in turn loads the values from adodb ssis object) but it seem to be missing on many values even though the task executes successfully. 


Monday, September 15, 2014 - 8:56:27 AM - Pavel Pawlowski Back To Top

Hi M Muthuraja,

 

download the sample solution here and take a look on it. In this solution you simply substitute the "Execute Child Package" Task with the Execute SQL Task, where  you will call you stored procedure.

Basically in SSIS all the Tasks in the Control Flow which are not connected by the Precedence constraint can run in parallel based on the MaxConcurentExecutable property.


Monday, September 15, 2014 - 2:48:46 AM - M Muthuraja Back To Top

Hi Pavel,

 

I have dynamic procedure list.How can i run the multiple stored procedure parallel in ssis.

Thanks in advance.

 


Friday, September 12, 2014 - 1:08:29 AM - M Muthuraja Back To Top

Hi Pavel,

             Am having a single package using that am getting all the csv files from folder using for loop.

also using execute sql task for bulk load.so am not able to check the parallel processing.

Thanks.


Thursday, September 11, 2014 - 7:55:06 AM - Pavel Pawlowski Back To Top

Hi M Muthuraja,

 

it is easy to check.. If you run the the control package from within SSDT, the tasks being executed in parallel have the yearlow circles indicating processing. If multiple are running at the same time, they are running in parallel.

 

if you execute it from from SSIS Server, you can then check the Logs.. If multiple tasks have StartTIme/EndTime which are overlapping, then those were also executing in parallel..

You can also put some custom logging for the times easily..


Thursday, September 11, 2014 - 1:07:54 AM - M Muthuraja Back To Top

Hi pavel,

Is there any way to check the package has running in parallel?

Am not aware to check the parallelism.any thoughts would be greatly appreciated.

Thanks


Wednesday, September 10, 2014 - 5:24:14 PM - doogle Back To Top

I spoke too soon.  I figured it out.  I was going through things a little too fast and forgot to setup the parameter on the child package.


Wednesday, September 10, 2014 - 5:19:16 PM - doogle Back To Top

This is a great article!  I have one issue that I can't seem to figure out.  I setup the parameter bindings for the child package, but when I run it, the value is not passed on.  I have added a script task prior to the Execute Package Task and the value is populated.  I also added a script task at the beginning of the child package that displays the local variable and it is always null.  Any thoughts would be greatly appreciated.


Wednesday, September 10, 2014 - 2:17:35 AM - Santosh Back To Top

Hi Pavel

I need to get Files to Process(File names) from Database table coloumn instead of Directory.How can i acheive this .

Thanks in Advance


Monday, September 08, 2014 - 2:11:15 AM - Pavel Pawlowski Back To Top

Hi M Muthuraja,

 

you can use this solution. Simply instead of Execute Child Package you can put Execute SQL Task with call to your stored procedure and passing proper parameters into it.


Monday, September 08, 2014 - 1:49:51 AM - M Muthuraja Back To Top

Hi Pavel,

 

How can i run this process parallel?

could you please help.


Friday, September 05, 2014 - 9:38:24 AM - M Muthuraja Back To Top

HI Pavel,

 

I got the files to extract using bulk insert in stored procedure.and call that procedure to move all the data into corresponding tables.

 

is there any way to implement Parallel processing. i know the option Maxconcurrentexecutables=-1

 

But i couldn't check whether this process running parallel ?

 

kindly give some idea.

 

Thanks .


Friday, September 05, 2014 - 12:13:01 AM - M Muthuraja Back To Top

Hi,

am getting all the file names and content.

How to assign the table name dynamically from flatfile source to  oledb destination.

because i need to import the data from all the csv to the corresponding tables.

Thanks .

 

 


Thursday, September 04, 2014 - 2:54:30 AM - M Muthuraja Back To Top

Hi Pavel,

 

Am also trying to put multiple files dynamically in multiple tables.

for example test001, test002 ...etc to test table

and raj002,raj004,raj005 etc to raj. how can i do that.

like wise we have around 100 files to proess.

Please help on this.


Wednesday, September 03, 2014 - 12:37:31 PM - Pavel Pawlowski Back To Top

Hi M Muthuraja,

 

The sample is build on processing multiple text files from directory and put their content into DB in parallel. Only you have to remember about eventual locking then inserting in parallel into single table for eventual performance issues. (but for example parallel bulk insert into a heap are not locking itself as they are allocating new data pages).

 

The second thing related to the UTF16 Big Endian. You are true, that SSIS does not support this and this is probably because the whole .Net on which SSIS is based is using little endian byte order.

But You can use a Script Component as a  source in which you will read the files by yourselves.

Something like this in the Script Component should work:

// Read a text file saved with Big Endian Unicode encoding.
System.Text.Encoding encoding = System.Text.Encoding.BigEndianUnicode;
System.IO.StreamReader reader = new System.IO.StreamReader("TextFile.txt", encoding);
string line = reader.ReadLine();
while (line != null)
{
    //Process the line - split into columns and add it to output buffer
}


Wednesday, September 03, 2014 - 2:23:21 AM - M Muthuraja Back To Top

Hi,

 

Am trying to extract UTF 16 big endian format file ..but ssis doesn't support UTF 16 big endian format.

 

its only supports UTF 16 little endian format.is there any way to do this.please help.


Wednesday, September 03, 2014 - 2:20:22 AM - M Muthuraja Back To Top

Hi ,

     i want to extract the data from multiple csv files to oledb destnation in ssis.

csv files are dynamically generated during COB(Eod job).File count is varying based on the data.

 

for example ,we have a file test002,test003 etc..

the two files need to extract and load it to the single table called test.

 

Also data extraction will be processing parallel.how to achieve this.

Thanks in advance.


Wednesday, August 27, 2014 - 12:21:23 AM - Santosh Back To Top

Hi Pavel

Thanks for your help.


Tuesday, August 26, 2014 - 12:20:11 PM - Pavel Pawlowski Back To Top

Hi Santosh,

 

related to the checking status of each individual file upload. In general you have many possibilities, but the easiest one is simply storing the statuses in a table. Then you can easily check the status by simple query or build a report for this purpose.

Alternatively you can write records to SSIS log or flat file or whatever is available.


Tuesday, August 26, 2014 - 6:05:16 AM - Santosh Back To Top

Hi Pavel

 

Thanks a lot for you Suggestion.It really helped me.

One more help.how can i check status of each individual file upload to FTP server is sucessful or failure.

 


Monday, August 25, 2014 - 2:45:07 PM - Pavel Pawlowski Back To Top

Hi Santosh,

 

Yes, this is exactly a solution how to achieve this without any third party tools. You can put the FTP Task into the child package or even substitute the child package with the FTP Task.

In this way you will upload several files to FTP in parallel and whenever one of the transfer will finish, another one will start.

Of course, it will not upload several chunks of single file in parallel. If I remember correctly, FTP does not support multiple chunks upload, only multiple chunks download.


Monday, August 25, 2014 - 1:12:34 AM - Santosh Back To Top

Hi Pavel

I have Sql Server BI 2008. I have Task to  to upload Parallel(asnc) files(large) to Ftp Server.Is there a way to acheive this without using third party controls.

Thanks in Advance


Wednesday, May 07, 2014 - 7:08:35 AM - Selvakumar A Back To Top

I got the solution :-) Thanks


Wednesday, May 07, 2014 - 4:18:22 AM - Selvakumar A Back To Top

Hi Pavel,

My Target table structure is varbinary datatype to store file content, Since you have explained with nvarchar and I am not able to convert the data type from nvarchar to varbinary. Could you please advise me. Thanks


Regards,

Selva

 


Wednesday, April 09, 2014 - 12:40:05 PM - Pavel Pawlowski Back To Top

Hi Chaitanya,

How you are assigning the data in the table to the queue?

In fact there two possibilities how to work with data coming from table.

  1. Use a Data Flow and use a script destination to put the data into the queue using script destination - then nothing changes in the actual parallel processing
  2. Using the recordsed (in data flow or the result of execute SQL Task). However this involves copying the pure COM Ado recordset over to an OleDB table or recordset and have it's own issues etc.

So I suggest you read the data in data flow, store them in a queue and then process in parallel.


Tuesday, April 08, 2014 - 6:36:24 PM - Chaitanya Bheri Back To Top

Hi Pavel,

This is a great article to implement parallelism without third party tools like cozyroc parallel loop task.

I had a similar task, but the values in the object variable (DirectoryContent) extracts the data from a table.

 

System.Collections.Generic.Queue filesQueue = (System.Collections.Generic.Queue)Dts.Variables["User::DirectoryContent"].Value;

I ended up with the following error message. 

Error Message :-

Unable to cast COM object of type 'System.__ComObject' to class type 'System.Collections.Generic.Queue`1[System.String]'. Instances of types that represent COM components cannot be cast to types that do not represent COM components; however they can be cast to interfaces as long as the underlying COM component supports QueryInterface calls for the IID of the interface.


Any workarounds for this please.



Sunday, January 12, 2014 - 10:51:47 AM - Ramone Hamilton Back To Top

I just wanted to let you know that this was very helpful to me in a project that I am currently working on.  I am using SSIS 2005 so I had to convert some of the code to VB but this worked perfectly.  I had be scouring the net trying to find a descent way to implement queues within SSIS and kept getting stuff about the MSQueue Task.  Great Job.


Learn more about SQL Server tools