![]() |
|
|
By: Ray Barley | Read Comments (1) | Print Ray is a Principal Architect at RDA Corporation and a MSSQLTips.com BI Expert. Related Tips: More |
|
Problem
We typically use the Lookup Data Flow Transform to retrieve the surrogate key for a dimension row based on the natural key in a fact table. While this works well, we occasionally have a situation where a fact table row has a natural key that isn't in our dimension table and we're looking for a way to handle this. What do you think?
Solution
The situation you describe is often referred to as early arriving facts. For a variety of reasons you tend to encounter this situation on occasion in a data warehousing environment. It may be a timing issue, a failed load from a particular source system, etc. A practical solution is to implement inferred member handling in your ETL processing for your dimension tables. An inferred member is a dimension row that has been created as a result of an early arriving fact. The inferred member will have the natural key from the fact table and default values for the rest of the columns. You probably want to add an indicator column in the dimension to identify the row as an inferred member; e.g. a bit column called InferredMember. The rationale behind inferred member processing is that it is only a temporary situation. In the very near future when the actual row is extracted from the source system and processed, you will simply perform a type 1 update, overwriting all columns in the inferred row with the values from the source system. You will also set the InferredMember indicator to 0 since the row is no longer inferred. For details on processing type 1 and type 2 changes please refer to our earlier tip on Handling Slowly Changing Dimensions.
Now that we have defined our solution at a high level, let's walk through the implementation. As an example we will process an Order fact table that references a Customer dimension. The schema for the Order staging and fact tables is as follows:
CREATE TABLE [dbo].[stg_fact_Order](
[nk_CustomerID] [nvarchar](255),
[OrderID] [int],
[OrderAmount] [money]
)
CREATE TABLE [dbo].[fact_Order](
[wk_Order] [int] identity NOT NULL,
[fk_Customer] [int] NOT NULL,
[OrderID] [int] NOT NULL,
[OrderAmount] [money] NOT NULL
) |
Note that in the above tables we have the nk_CustomerID column which is the source system natural key for the Customer dimension. The fk_Customer column will be populated with the Customer dimension surrogate key.
In order to facilitate efficient lookups in our ETL processing we will maintain a table in the staging database with the surrogate key, natural key, and inferred member indicator for the Customer dimension. The schema will be as follows:
CREATE TABLE [dbo].[tbl_CustomerLookup](
[wk_Customer] [int] IDENTITY NOT NULL,
[nk_CustomerID] [nvarchar](255) NOT NULL,
[InferredMember] [bit] NOT NULL
) |
In our Order fact ETL processing we will use the Lookup component in the Data Flow to get the Customer surrogate key by joining on the natural key in the CustomerLookup table. Prior to the Data Flow, however, we will insert a new row into the CustomerLookup table for any natural key that isn't already there; i.e. our early arriving facts. We will use the following stored procedure:
CREATE PROCEDURE [dbo].[stp_InsertInferredFromOrder] AS BEGIN INSERT INTO dbo.tbl_CustomerLookup ( nk_CustomerID ,InferredMember ) SELECT DISTINCT stg.nk_CustomerID ,1 FROM dbo.stg_fact_Order stg LEFT OUTER JOIN dbo.tbl_CustomerLookup c ON c.nk_CustomerID = stg.nk_CustomerID WHERE wk_Customer IS NULL END |
Any fact table that references the Customer dimension can follow the pattern in the above stored procedure of inserting a row into the CustomerLookup table to create an inferred member where necessary. At some point (typically at or near the end) in the ETL process the InferredMember rows in the CustomerLookup table are inserted into the Customer dimension, allowing joins between fact tables and the dimension to be successful.
Now that we have reviewed the overall solution, let's take a look at its implementation in an SSIS package. The Control Flow is as follows:
Main points about the Control Flow:
The Load fact Order Data Flow task is as follows:
Main points about the Load fact Order Data Flow:
Next Steps
| Share: | Share | Tweet |
|
![]() |
![]() |
Free SQL Server Learning |
| Tuesday, June 14, 2011 - 6:32:10 AM - Babu | Read The Tip |
|
If the Excel source file is given in the artical, Then the package and data is easily analysed. |
|
|
privacy | disclaimer | copyright | advertise | about authors | contribute | feedback | giveaways | user groups Some names and products listed are the registered trademarks of their respective owners. Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com |