Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Metadata Driven SQL Server Integration Services Data Flow Transformations Using BIML


By:   |   Last Updated: 2016-11-08   |   Comments   |   Related Tips: More > Integration Services Development

Problem

I read the previous tip to implement metadata driven SSIS development, but how do I extend the functionality to implement SSIS data flow transformations?

Solution

During a transformation, a set of business/technical rules are applied to the data, so that we can load the data to the target tables. Data cleansing is a key step in the transformation process.

Some examples of transformations are:

  • Excluding NULL values for key columns
  • Lookup against dimension tables to get surrogate keys
  • Derive new column values based on business rules
  • Aggregating data based on columns
  • Deduplication, ordering, data enrichment, etc.

In this tip, we will learn and practice the concept of metadata management for data transformations.

There are many transformations which exists in SSIS to solve various business problems. We are going to take two transformations as an example to understand the concept. Once you are familiar with the concept you can extend the design to accommodate other transformations.  The two methods we will look at are:

  • Derived Column Transformations
  • Lookup Transformations

Building an Example Model

In this example we are going to add a derived column transformation to the data flow task. Let’s have a quick look at the database design to accommodate this transformation.

Prerequisites

It is assumed that you have already reviewed the previous tip about building a metadata model for a data warehouse. The database script from the previous tip becomes the basis for this tip.

Database Design and Stored Procedures

The below script will enhance the metadata model by creating tables for the data transformation.

USE [MetadataModel]
GO

create proc [dbo].[ListLookupHeaderForCustomer]
as
Select LTH.TransformationShortName,
LTrim(RTrim(DP.Name)) as ConnectionName,
LTH.SourceQuery 
from [dbo].[LookupTransformationHeader] LTH
INNER JOIN dbo.DataPackage DP 
on DP.DataPackageID = LTH.SourceDataPackageID
GO

create proc [dbo].[ListLookupInputColumnDetails]
as
Select LTH.TransformationShortName,LTD.ColumnName from [dbo].[LookupTransformationHeader] LTH
INNER JOIN [dbo].[LookupTransformationDetail] LTD
on LTH.LookupTransformationHeaderID = LTD.LookupTransformationHeaderID
and LTD.Type='IN'
GO

create proc [dbo].[ListLookupOutputColumnDetails]
as
Select LTH.TransformationShortName,LTD.ColumnName from [dbo].[LookupTransformationHeader] LTH
INNER JOIN [dbo].[LookupTransformationDetail] LTD
on LTH.LookupTransformationHeaderID = LTD.LookupTransformationHeaderID
and LTD.Type='OUT'
GO

Create proc [dbo].[ListTransformationForCustomerCopy] as 
SELECT DTH.TransformationShortName,DTD.DerivedColumnName,DTD.Expression,
DTD.DataType,DTD.[DataLength],DTD.ReplaceExisting 
FROM [dbo].[TransformationStep] TS 
INNER JOIN [dbo].[DerivedColumnTransformationHeader] DTH 
on TS.TransformationStepID = DTH.DerivedTransformationHeaderID 
INNER JOIN [dbo].[DerivedColumnTransformationDetail] DTD 
on DTH.DerivedTransformationHeaderID = DTD.DerivedColumnTransformationHeaderID 
GO

CREATE TABLE [dbo].[DerivedColumnTransformationDetail](
[DerivedColumnTransformationHeaderID] [int] NOT NULL,
[DerivedColumnTransformationDetailID] [int] NOT NULL,
[DerivedColumnName] [nvarchar](250) NOT NULL,
[DerivedColumnType] [nvarchar](250) NOT NULL,
[Expression] [nvarchar](200) NOT NULL,
[DataType] [nvarchar](50) NOT NULL,
[DataLength] [nvarchar](50) NOT NULL,
[Precision] [nvarchar](50) NOT NULL,
[Scale] [nvarchar](50) NOT NULL,
[CodePage] [nvarchar](50) NOT NULL,
[ReplaceExisting] [nvarchar](50) NULL,
CONSTRAINT [PK_TransformationDerivedColumn] PRIMARY KEY CLUSTERED 
(
[DerivedColumnTransformationDetailID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
CREATE TABLE [dbo].[DerivedColumnTransformationHeader](
[DerivedTransformationHeaderID] [int] IDENTITY(1,1) NOT NULL,
[TransformationShortName] [nvarchar](50) NOT NULL,
[TransformationComments] [nvarchar](200) NULL,
CONSTRAINT [PK_DerivedColumnTransformationHeader] PRIMARY KEY CLUSTERED 
(
[DerivedTransformationHeaderID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
CREATE TABLE [dbo].[LookupTransformationDetail](
[LookupTransformationDetailID] [int] NOT NULL,
[LookupTransformationHeaderID] [int] NOT NULL,
[ColumnName] [nvarchar](50) NOT NULL,
[DataType] [nvarchar](50) NOT NULL,
[DataLength] [nvarchar](50) NOT NULL,
[Type] [nchar](10) NOT NULL,
CONSTRAINT [PK_LookupTransformationDetail] PRIMARY KEY CLUSTERED 
(
[LookupTransformationDetailID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
CREATE TABLE [dbo].[LookupTransformationHeader](
[LookupTransformationHeaderID] [int] NOT NULL,
[TransformationShortName] [nvarchar](250) NOT NULL,
[SourceDataPackageID] [int] NOT NULL,
[NoMatchConfiguration] [nvarchar](250) NOT NULL,
[SourceQuery] [nvarchar](4000) NOT NULL,
CONSTRAINT [PK_LookupTransformationHeader] PRIMARY KEY CLUSTERED 
(
[LookupTransformationHeaderID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
CREATE TABLE [dbo].[TransformationStep](
[MappingID] [int] NOT NULL,
[TransformationStepName] [nvarchar](50) NOT NULL,
[TransformationType] [nvarchar](50) NOT NULL,
[TransformationSequenceID] [int] NULL,
[TransformationStepID] [int] IDENTITY(1,1) NOT NULL,
[TransformationID] [int] NULL,
CONSTRAINT [PK_TransformationStep_1] PRIMARY KEY CLUSTERED 
(
[TransformationStepID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
ALTER TABLE [dbo].[DerivedColumnTransformationDetail] WITH CHECK ADD CONSTRAINT [FK_DerivedColumnTransformationDetail_DerivedColumnTransformationHeader] FOREIGN KEY([DerivedColumnTransformationHeaderID])
REFERENCES [dbo].[DerivedColumnTransformationHeader] ([DerivedTransformationHeaderID])
GO
ALTER TABLE [dbo].[DerivedColumnTransformationDetail] CHECK CONSTRAINT [FK_DerivedColumnTransformationDetail_DerivedColumnTransformationHeader]
GO

Relationships between transformation tables can be seen below.

SQL Server Database Design for the Data Transformation

Table Explanation

The above script should have created the transformation tables. Usage of these tables is provided below.

TransformationStep Table

This table stores the transformation details. If there are "n" transformations between the source and target, it is expected to have all "n" transformations in this table. Only a basic definition of the transformation will be available in this table. The detailed definition of the transformation will be stored in transformation specific tables.

This table stores information about:

  • MappingID – Unique key to identify the data flow
  • TransformationType – It defines the type of transformation (example DER for derived column transformation)
  • TransformationSequenceID – It helps to arrange multiple transformations in numeric order

DerivedColumnTransformationHeader Table

This is a DerivedColumnTransformation specific table and it stores high level information.

This table stores information about:

  • TransformationShortName – Transformation name that can be used within the data flow
  • TransformationComments – Detailed comments about the transformation
  • TransformationHeaderID – Surrogate key to uniquely identify a transformation

DerivedColumnTransformationDetail Table

This table stores column level details for the transformation.

This table stores information about:

  • TransformationHeaderID – Help to link the transformationHeader
  • TransformationDetaiID – Surrogate key
  • DerivedColumnName
  • Expression – Actual expression used in the Derived column transformation
  • DataType, DataLength – Data type related details
  • ReplaceExisting – Flag to indicate whether the given expression is to replace existing column

ListTransformationForCustomerCopy Stored Procedure

This stored procedure will join the TransformationStep, DerivedColumnTransformationHeader and DerivedColumnTransformationDetail tables to return the dataset to build the derived column transformation.

Load Sample Data

Let's insert some data into the tables.

USE [MetadataModel]
GO

INSERT INTO [dbo].[TransformationStep]
([MappingID]
,[TransformationStepName]
,[TransformationType]
,[TransformationSequenceID]
,[TransformationID])
VALUES
(1
,'Derive New Column'
,'DER'
,1
,1)
GO
INSERT INTO [dbo].[DerivedColumnTransformationHeader]
([TransformationShortName]
,[TransformationComments])
VALUES
('DE - Upper Case'
,'Convert Name to Upper Case')
GO
INSERT INTO [dbo].[DerivedColumnTransformationDetail]
([DerivedColumnTransformationHeaderID]
,[DerivedColumnTransformationDetailID]
,[DerivedColumnName]
,[DerivedColumnType]
,[Expression]
,[DataType]
,[DataLength]
,[Precision]
,[Scale]
,[CodePage]
,[ReplaceExisting])
VALUES
(1
,1
,'Name'
,'DER'
,'UPPER(NAME)'
,'String'
,'100'
,'NA'
,'NA'
,'NA'
,'true')
GO
INSERT INTO [dbo].[LookupTransformationHeader]
([LookupTransformationHeaderID]
,[TransformationShortName]
,[SourceDataPackageID]
,[NoMatchConfiguration]
,[SourceQuery])
VALUES
(1
,'LKP - Phonenumber'
,1
,'NA'
,'Select Name,PhoneNumber from dbo.CustomerLookup')
GO
INSERT INTO [dbo].[LookupTransformationDetail]
([LookupTransformationDetailID]
,[LookupTransformationHeaderID]
,[ColumnName]
,[DataType]
,[DataLength]
,[Type])
VALUES
(1
,1
,'Name'
,'String'
,'250'
,'IN'),
(2
,1
,'PhoneNumber'
,'String'
,'250'
,'OUT')
GO

Derived Column Transformation

The stored procedure ListTransformationForCustomerCopy will be called from the BIMLScript to get the Derived Column Transformation details for the CustomerCopy data flow task. The returned dataset will be stored in a DataTable. As we have only one derived column expression, we can make use of the first retuned row to build the transformation.

The below BIML script will help us generate an SSIS package based on the configuration in the Metadata Repository.

<#@ import namespace="System.Data" #> 
<#@ import namespace="System.Data.SqlClient" #>
<#@ template language="C#" tier="1" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<#
string strConnection = @"Data Source=localhost\SQL2012;Initial catalog=MetadataModel; Integrated security=SSPI;"; 
string connectionString =""; 
DataTable dtConnection = new DataTable(); 
SqlDataAdapter sqldaStaging = new SqlDataAdapter("exec [dbo].[ListconnectionDetails];", strConnection); 
sqldaStaging.Fill(dtConnection); 
DataRow rwConnection = dtConnection.Rows[0];

DataTable dtTableMapping = new DataTable();
SqlDataAdapter sqldaTableMapping = new SqlDataAdapter("exec [dbo].[ListCustomerCopyMapping];", strConnection);
sqldaTableMapping.Fill(dtTableMapping);
DataRow rwTableMapping = dtTableMapping.Rows[0];

DataTable dtTransformation = new DataTable();
SqlDataAdapter sqldaTransformation = new SqlDataAdapter("exec [dbo].[ListTransformationForCustomerCopy];", strConnection);
sqldaTransformation.Fill(dtTransformation);
DataRow rwTransformation = dtTransformation.Rows[0];
#>
<Connection Name="<#=rwConnection["ConnectionName"]#>" ConnectionString="<#=rwConnection["ConnectionString"]#>" />

</Connections>
<Packages>
<Package Name="staging" ConstraintMode="Linear">
<Tasks>
<ExecuteSQL Name="Truncate Table" ConnectionName="<#=rwConnection["ConnectionName"]#>"> 
<DirectInput> Truncate table dbo.customer_copy </DirectInput> 
</ExecuteSQL>
<Dataflow Name="DFT Copy Customer Data">
<Transformations> 
<OleDbSource Name="Source" ConnectionName="<#=rwTableMapping["ConnectionName"]#>" >
<DirectInput>SELECT * FROM <#=rwTableMapping["SourceTableName"]#></DirectInput> 
</OleDbSource> 
<DerivedColumns Name="<#=rwTransformation["TransformationShortName"]#>"> 
<Columns> 
<Column ReplaceExisting="<#=rwTransformation["ReplaceExisting"]#>" Name="<#=rwTransformation["DerivedColumnName"]#>" DataType="<#=rwTransformation["DataType"]#>" Length="<#=rwTransformation["DataLength"]#>" > <#=rwTransformation["Expression"]#> 
</Column> 
</Columns> </DerivedColumns> 

<OleDbDestination Name="Target" ConnectionName="<#=rwTableMapping["ConnectionName"]#>"> 
<ExternalTableOutput Table="<#=rwTableMapping["DestinationTableName"]#>"/> 
</OleDbDestination> 
</Transformations> 
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>

</Biml>

Similar design approaches can be followed for other row by row transformations.

BIML generated SSIS Package for Data Flow Task Transformations

 

Lookup Transformation

Now we can apply the same logic to generate a Lookup transformation in SSIS. The lookup transformation is different from other transformations in that we need additional tables to store information associated with the lookup transformation.

LookupTransformationHeader Table

This table stores high level information about lookup transformation.

This table stores information about:

  • LookupTransformationHeaderID – Surrogate key for the transformation header
  • TransformationShortName – Name used in the data flow
  • SourceDataPackageID – Used to derive the connection manager
  • NoMatchConfiguration – Value to define in case of no matching records returned
  • Source Query – SQL Query used to get the lookup data

LookupTransformationDetail Table

This table stores column level details for the transformation.

This table stores information about:

  • LookupTransformationDetailID – Surrogate Key for the detail table
  • LookupTransformationHeaderID – Helps to link the header table
  • Column Name
  • DataType
  • DataLength
  • Type –Input or Output column

The below BIML script will help us build the transformation components from the Metadata Repository.

<#@ import namespace="System.Data" #> 
<#@ import namespace="System.Data.SqlClient" #>
<#@ template language="C#" tier="1" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<#
string strConnection = @"Data Source=localhost\SQL2012;Initial catalog=MetadataModel; Integrated security=SSPI;"; 
string connectionString =""; 
DataTable dtConnection = new DataTable(); 
SqlDataAdapter sqldaStaging = new SqlDataAdapter("exec [dbo].[ListconnectionDetails];", strConnection); 
sqldaStaging.Fill(dtConnection); 
DataRow rwConnection = dtConnection.Rows[0];

DataTable dtTableMapping = new DataTable();
SqlDataAdapter sqldaTableMapping = new SqlDataAdapter("exec [dbo].[ListCustomerCopyMapping];", strConnection);
sqldaTableMapping.Fill(dtTableMapping);
DataRow rwTableMapping = dtTableMapping.Rows[0];

DataTable dtTransformation = new DataTable();
SqlDataAdapter sqldaTransformation = new SqlDataAdapter("exec [dbo].[ListLookupHeaderForCustomer];", strConnection);
sqldaTransformation.Fill(dtTransformation);
DataRow rwTransformation = dtTransformation.Rows[0];

DataTable dtLKPTransformationInput = new DataTable();
SqlDataAdapter sqldaLKPTransformationInput = new SqlDataAdapter("exec [dbo].[ListLookupInputColumnDetails];", strConnection);
sqldaLKPTransformationInput.Fill(dtLKPTransformationInput);
DataRow rwLKPTransformationInput = dtLKPTransformationInput.Rows[0];

DataTable dtLKPTransformationOutput = new DataTable();
SqlDataAdapter sqldaLKPTransformationOutput = new SqlDataAdapter("exec [dbo].[ListLookupOutputColumnDetails];", strConnection);
sqldaLKPTransformationOutput.Fill(dtLKPTransformationOutput);
DataRow rwLKPTransformationOutput = dtLKPTransformationOutput.Rows[0];


#>
<Connection Name="<#=rwConnection["ConnectionName"]#>" ConnectionString="<#=rwConnection["ConnectionString"]#>" />

</Connections>
<Packages>
<Package Name="staging" ConstraintMode="Linear">
<Tasks>
<ExecuteSQL Name="Truncate Table" ConnectionName="<#=rwConnection["ConnectionName"]#>"> 
<DirectInput> Truncate table dbo.customer_copy </DirectInput> 
</ExecuteSQL>
<Dataflow Name="DFT Copy Customer Data">
<Transformations> 
<OleDbSource Name="Source" ConnectionName="<#=rwTableMapping["ConnectionName"]#>" >
<DirectInput>SELECT * FROM <#=rwTableMapping["SourceTableName"]#></DirectInput> 
</OleDbSource> 
<Lookup Name="<#=rwTransformation["TransformationShortName"]#>" OleDbConnectionName="<#=rwTransformation["ConnectionName"]#>" >
<DirectInput>
<#=rwTransformation["SourceQuery"]#>
</DirectInput>
<Inputs>
<Column SourceColumn="<#=rwLKPTransformationInput["ColumnName"]#>" />
</Inputs>
<Outputs>
<Column SourceColumn="<#=rwLKPTransformationOutput["ColumnName"]#>" />
</Outputs>
</Lookup>

<OleDbDestination Name="Target" ConnectionName="<#=rwTableMapping["ConnectionName"]#>"> 
<ExternalTableOutput Table="<#=rwTableMapping["DestinationTableName"]#>"/> 
</OleDbDestination> 
</Transformations> 
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>

BIML generated SSIS Package for the Lookup Transformation

 

Scope For Further Improvement

In this example, we have seen only one transformation for a data flow task, but in business scenarios we need multiple transformations to load data to target tables. The TransformationStep table plays a pivotal role by connecting the Mapping (Data flow) to all the transformations. The sequenceID in the TransformationStep table will help to sort the transformations based on the need.

A new stored procedure will need to be created to list out all the transformations for a given data flow task. This stored procedure will be called from the BIML script to generate the BIML code snippet based on the transformation.

Summary

This metadata model for transformation logic can be extended for other transformations such as Union All, Merge Join, Merge, Sort, etc. Based on the above examples we can see the metadata driven development can be accelerated with the help of BIMLScript.

Next Steps


Last Updated: 2016-11-08


next webcast button


next tip button



About the author
MSSQLTips author Nat Sundar Nat Sundar is working as an independent SQL BI consultant in the UK with a Bachelors Degree in Engineering.

View all my tips




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Learn more about SQL Server tools