Using the Data Profiling SQL Server Integration Services Task

By:   |   Comments (7)   |   Related: More > Integration Services Control Flow Transformations


Problem

Have you ever had to write a bunch of TSQL to do some data analysis on the table data in your database. If you have you'll know that this can become a fairly time consuming and tedious task? SQL Server 2012 Integration Services has a feature called the Data Profiling task that can help you perform this analysis much easier and faster (this feature is also available in SQL Server 2008). This task can help you find potential issues with your existing data as well as help you become more familiar with the data in a database that you have just started managing.

Solution

To use the Data Profiling SSIS task in SQL Server 2012 we first need to create a new Integration Services project in SQL Server Data Tools (formerly Business Intelligence Development Studio). Once you've created the project and opened it, your screen should look as follows with an empty package design tab.

Data Tools Initial Screen

Now that we have a package open we need to add a connection manager object for the database we want to run the profile against. To do this we right click "Connection Managers" in the Solution Explorer and select "New Connection Manager". Then select "ADO.NET" and click "Add...", then click "New..." and then fill in dialog box as show below. Once everything is filled in I like to click "Test Connection" just to make sure it's configured correctly. After this is successful you can click "OK" to complete the setup.

DB Connection Manager Details

Next we can add the Data Profiling task to the package by clicking and dragging it from the SSIS Toolbox onto the Control Flow panel.

Package With Data Profiling Task

We can start to configure the task by double clicking on the task in the Control Flow panel. A dialog box opens to the General Tab where we can setup our Profiler Output File connection. Click on the Destination property and select "<New File connection...>" and complete the dialog box as shown below. Once complete we can click "OK".

File Connection Manager Details

The complete General tab should now look like this.

Completed General Tab

From this panel we have two different methods we can use to configure what we are going to profile. First we can click on "Quick Profile" and we are presented with a form we can use to configure one or more tables to be profiled. For this example I am just going to select a single table and check all the profiles to be computed. A description of each profile can be found here. Once completed the form should look like the one below.

Completed Quick Profile Form

After clicking "OK" on the form above we are taken to the Profile Requests panel which is the other method of configuring this task and could have been selected when the initial form was opened. Looking at the screenshot below we can see all of the Profile Types we selected in the last form are configured here. Looking under the Data subsection of the Request Properties section we can see that we also have the ability to profile one or more columns within the table. As an example, if we were looking for a candidate key we may exclude some columns from the profile that we know for sure would not be part of the key.

Profile Requests Panel from Quick Profile

Let's add one more table/column by clicking on the next empty space in the "Profile Type" column. After selecting the profile type we can fill in the "Request Properties" section as shown below by select a connection manager, table/view and columns. In this case we are just going to select a single column. After clicking "OK" to complete the configuration we are taken back to the Control Flow tab of our package.

Profile Requests Panel

The configuration of our data profiling task is now complete and we can run it by selecting "Start Debugging" from the Debug menu or by using the F5 shortcut. Once it has completed successfully we have to select "Stop Debugging" from the Debug menu or use the Shift+F5 shortcut to be able to view the output.

The output file is just a regular xml file, here is a sample, and Microsoft provides two options for launching the Data Profile Viewer which displays this xml file in a nice formatted output. The first way is to double click on the Data Profiling task on the Control Flow tab and it will open up the General tab as it did when you were configuring the task. Once here we can click "Open Profile Viewer". Alternatively we can start the Data Profile Viewer as a stand-alone application. It can be found under the Start Menu -> Microsoft SQL Server 2012 -> Integration Services -> Data Profile Viewer. Once there simply open the output file that was created by the Data Profiling task. Regardless of the method you choose you will get the following window to display the contents of the xml file.

Data Profile Viewer
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Ben Snaidero Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Friday, March 9, 2018 - 1:04:55 PM - Ashish Back To Top (75380)

 Hi Ben,

I was trying my data profiling task for the first time and needed a starting point. Your article help me set it up and running.

Thanks for putting it up.

 Regards,

Ashish

 


Wednesday, March 18, 2015 - 3:59:39 AM - Dave Hogg Back To Top (36569)

I'm with Boris. Usually the whole point for doing data profiling in an SSIS context is that you are investigating diverse data sources that you are looking into to build SSI ETL processes to populate your warehouse or other repository with. My experience is that those other data sources are every database you can think of. So not a good choice of a data profiling tool.

Still a good post for how to use it and so much better than the official MSDN manual :-)


Tuesday, February 19, 2013 - 12:38:24 PM - LMW Back To Top (22276)

Hmm... I've got overwrite checked, but I see duplicate rows for the same column on a second run. With the same percentages. Using 2008 though :)


Monday, February 18, 2013 - 4:25:03 PM - Ben Snaidero Back To Top (22246)

Hi LMW

It depends on how you have your destination connection setup.  It won't ever add to the data but you can set it to automatically overwrite the destination file in the properties. Otherwise you have to manually remove the file before rererunning 

Thanks for reading

Ben


Monday, February 18, 2013 - 12:13:24 PM - LMW Back To Top (22242)

Hey Ben,

If you re-run the profiler on the same destination, do you have to make a 'fresh' destination? If you don't. does it just add to the data that was already there?

 

Thanks,


Wednesday, January 16, 2013 - 11:05:19 AM - Sengwa Back To Top (21486)

Awesome..nice article. Thanks a bunch.


Wednesday, January 16, 2013 - 10:42:24 AM - Boris Tyukin Back To Top (21485)

great post, Ben! I had high hopes for SSIS data profiler but unfortunatelly it is tough to use it for real projects due to the fact that the data source is limited by ADO.NET providers.

Open source project Data Cleaner is far superior and it is free. The newest version even allows to use it for real-time data quality solutions and Kasper (develop behind it) constantly releases new versions and improves it. Check it out for sure you will fall in love with it!















get free sql tips
agree to terms