Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Powerful SQL Server Data Cleansing and Processing


By:   |   Last Updated: 2019-04-10   |   Comments   |   Related Tips: More > Application Development

Problem

Our organization receives and transmits data on a daily basis for business-critical processes to numerous partners, vendors, customers, suppliers and more.  Our data needs to be validated and cleansed prior to loading the data to our SQL Server environments.  We have made significant investments with our SQL Server DBAs and Developers to automate the processes as much as possible. Unfortunately, we are constantly shooting at a moving target i.e. file formats differ between each system, we often get unexpected file formats (data is added, removed or re-ordered), new data is being requested between the organizations and more unpredictable circumstances.  On top of this, we need to ensure all of the data is properly vetted by the Data Stewards intimately knowledgeable on the data.

The constant data management moving target tends to overwhelm the SQL Server DBAs and Developers with the need for immediate response of updating the validation and import code to then load the data to meet the business requirements between the organizations.  How can we have the best of both worlds?  Have our SQL Server DBAs and Developers build a system to support the organization, while providing our Data Stewards the flexibility to validate and cleanse the data prior to the production load without having the deep programming knowledge of our technical team.

Solution

With SQL Server Data Platform, most DBAs and Developers turn to either SQL Server Integration Services (SSIS), Azure Data Factory (ADF), BIML, XML, JSON or T-SQL code.  With standardized files, formats and data, these technologies are able to automate tedious processes.  The SQL Server technologists are able to shine and show their value in terms of time savings for the entire organization.

Unfortunately, many processes are difficult to automate due daily changes between systems perhaps due to constantly changing marketing programs, systems updates, short term programs, shifting product lines, fulfillment via numerous vendors and more.  From a technology perspective, this can be considered an SSIS Nightmare with hundreds of unwieldy packages, version control debacle and countless hours trying to automate processes that are rapidly evolving.  These situations quickly become frustrating for the SQL Server technologists and can benefit from the collaboration with Application Owners, Data Stewards and Power Users, who can decipher the data and interface with the other organizations to work through the process.

So how do we avoid the SSIS Nightmare?  Let the technologists automate standardized processes and give the Data Stewards a solution to cleanse, validate and import data without becoming programmers?  One solution, that I would like to introduce is Unison from Melissa, a solution built for Data Stewards enabling them to validate, cleanse, import and export data via an intuitive project-oriented framework.  Unison delivers:

  • Intuitive three step process to perform data cleansing
  • Simple point and click interface to profile, cleanse, standardize, enrich, match, merge and validate data
  • Highly refined data quality algorithms to ensure your data is accurate with monthly meta data updates
  • Address verification for 240 countries, Name Parsing and more
  • Import and export data from major databases (SQL Server, Oracle and MySQL), Excel and flat files
  • Intuitive web-based interface without programming expertise
  • Collaboration among business users to ensure critical business processes are met
  • Project oriented paradigm for re-use, modification and recurring scheduling
  • Fast data processing - 50 million address records per hour
  • Architecture designed for on-premises deployments of container technology to meet internal security and performance requirements

Let’s dive in to see Unison in action.

Getting Started with Unison Data Quality for Data Stewards

To get started with cleansing and validating your data with Unison, login to the dashboard to have access to Projects, Jobs and administrative interfaces such as logs, user administration, meta data updates and more.  Start a new project by clicking on the Projects link in the left navigation and select the New option.  This begins the process by naming the project then specifying whether the project will be public or private.

Getting Started with Unison Data Quality for Data Stewards

Once the project is named, three core steps are followed to complete the project:

  1. Define the Input Source as either a text file or database connection
  2. Configure the data validation Tasks
  3. Specify Output Source to either a text file or database connection

Both the Input and Output Sources can be a flat file or database connection to SQL Server, Oracle or MySQL.  Below is the interface to create a database connection in your project.

Create a new database connection in Unison

In this project, we will drag and drop a text file from Windows Explorer into Unison and Unison will read the file header as well as preview the first three records as shown below.  Unison includes intelligent file detection, but if needed the text file properties can be configured in terms of the delimiter, field enclosure such as double quotes or if the file is a fixed width.

Unison initial data validation

Once configured, the Project looks like this:

Unison Project that is a work in progress

Unison Data Validation Tasks

The true value of Unison is apparent with the data validation tasks.  There are currently four available tasks including Name Parsing, Phone, Email and Address Cleansing which includes 240 countries.  Coming soon will be Matching, Data Profiler and Data Cleansing that are currently supported in complimentary Melissa solutions.  For Unison, simply click on any task to include it in the project.  Let’s start with Address Cleansing for our project.

Unison Data Validation Tasks

Address Cleansing with Unison

For Address Cleansing, the Task will match the input files to common address fields in Unison.  The process starts with identifying the Input Fields then the Output Fields which will be ultimately used in the final step of the process.  The Output Fields are the validated Melissa results to ensure the final data is accurate. Once we review and save the configuration, we will return to the main project design window.

Name Parsing with Unison

Our next step is to perform Naming Parsing with Unison that follows a similar paradigm as the Address Task.  This includes input and output fields with validation as well as separation of each name segment.

Name Parsing with Unison

When we return back to the project, the current status with the Input and Tasks are completed.  Our final configuration step is to define the output.

Unison Data Validation Project In Progress

Unison Outputs

Just like with the Unison Inputs, the Outputs can be both to a File or Table in SQL Server, Oracle or MySQL.  With the Write to File option shown below, the name, extension, enclosure and delimited are included.  Further, the file can be compressed as a Zip or GZip file once built.  Another unique feature is the ability to filter the file output based on Full address matches, full phone number matches, etc.

Write to a File in Unison as an Output Task

As a final step prior to outputting the data, Unison provides the ability to review the Output Fields, turn columns on / off with the slider and rearrange the column order.  The legend on the right specifies the color coding associated with the Output Fields.

Review Output Fields in Unison

Final Project Review

Now that the project is configured, there are three options on the top right of the interface including:

  1. Run Now
  2. Preview Output
  3. Schedule

Final Data Cleansing Project in Unison

Preview Output

From the Preview Output option, you can review all columns and the first set of rows to ensure the data is going to be delivered in the needed format.  If not, you can update the project to your requirements.

Output Preview in Unison

Run Now in Unison

If the data is correct based on the Preview Output, then Run the job, review the status and download the final output file the next steps in your business processes.  Whether that means copying the file to a directory for the data to be loaded by an automated process built by SQL Server Professionals, encrypting the file then sharing with team members or partners, etc.

Job Status in Unison

If you click on the View Reports button, you are able to see the results from both the Name and Address Parsing in either a graphical or tabular format for the types of changes made as well as errors recorded.

Unison Reporting

Unison Scheduling

The final option for the project is to schedule the job to run either daily, weekly, monthly or a customized time frame. Below is the intuitive interface to schedule a job daily.

Schedule a Job to run daily in Unison

Unison also allows the opportunity to update and review the Jobs to meet daily management and monitoring needs.

Unison Scheduling Interface

How do I get started with Unison?

  1. Think about the numerous hours and relive the frustration of meeting critical deadlines that your business and technology groups face when trying to cleanse, validate and load data among various systems.
  2. If you are experiencing SSIS Nightmares with constantly changing requirements, code and too many SSIS Packages to count, then consider Unison as a technology that you can integrate into your processes. This enables your Power Users to validate the data and your SQL Server Professionals to manage the data in production environments.
  3. Get a personal demo of Unison from a member of the Melissa team and then put Unison through its paces in your environment with your data.  See how easy it is to validate your data and streamline tedious business processes that have not included data validation.
  4. Communicate how Unison can help your organization:
    1. Time savings for data validation and cleansing
    2. Put the power of data quality in the hands of your data stewards
    3. No programming experience necessary, only knowledge of the data
    4. Reduce the burden of your SQL Server Professionals to focus on production data
    5. Introduce data quality to new processes to improve the value of your data
  5. After working with Unison, measure the tangible and intangible benefits and determine your next steps to radically streamline business processes and put the power of data quality in the hands of your power users.
Next Steps


Last Updated: 2019-04-10


next webcast button


next tip button



About the author
MSSQLTips author Jeremy Kadlec Since 2002, Jeremy Kadlec has delivered value to the global SQL Server community as an MSSQLTips.com co-founder and Edgewood Solutions SQL Server Consultant.

View all my tips




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Learn more about SQL Server tools