Data Mining Build Model with OLTP Data Source
By: Siddharth Mehta
Let's consider a business scenario that can be addressed by applying data mining as a solution. Let's say that based on certain attributes, we want to classify customers into different categories of membership, as found with most retail companies. After identifying unique characteristics of the membership categories, we intend to send a targeted mailing to those groups. This will enable the marketing department of the company to design the promotion strategy based on the characteristics of the group, and result in a better probability of making sales. The Adventure Works data warehouse that we installed in the previous chapter contains data designed to analyze this business case. So, let's take the first step to apply data mining to this business case.
When a new data mining model is built, SSAS retrieves data from the data source and stores it in a proprietary format. Building several mining models from the same data set can result in redundant data stored in SSAS. To share data across several mining models, Analysis Services stores the information about the data that can be shared across several mining models in an object called a Mining Structure. Internally, the information read from relational data sources is stored in a cube to efficiently retrieve the data during mining model creation.
As a first step to create a data mining model for our business case, we need to create a data source, data source view and data mining structure. Follow the below steps.
1) Open SQL Server Data Tools and create a new SSAS project. Add a data source pointing to the Adventure Works Data warehouse.
2) Create a Data Source View that points to the vTargetMail view from the data source created in Step 1. vTargetMail is a view that contains information about customers who buy bicycles. Based on the information in the view, we can identify potential customers who are likely to buy bicycles. The vTargetMail view has been specifically designed to contain patterns that can be identified by the data mining algorithms, can be categorized based on their attributes, and customized mail can be sent to customers based on their likelihood to buy bicycles.
3) Right-click the Mining Structures folder in the Solution Explorer, and select New Mining Structure to launch the Data Mining Wizard.
4) Select the From Existing Relational Database or Data Warehouse radio button, and click Next.
5) On the Create the Data Mining Structure page, select Microsoft Decision Trees and click Next.
6) On the Select Data Source View page, select Adventure Works that contains vTargetMail and click Next.
7) The Specify Table Types page, shows two selections: Case (the primary table) and Nested. Certain algorithms are used for problems such as market-basket analysis and the need to analyze data across multiple tables. In such cases, you need to select certain tables as Nested tables.
8) On the Specify the Training Data page of the wizard, you select the columns from the source tables that are to be used in creation of the mining models. In addition, you need to specify whether a specific column should be used as a key column, input column, or predictable column. If you specify a column as an input column, Analysis Services uses this column as an input to the mining model for determining patterns. If a specific column is marked as predictable, Analysis Services enables you to predict this column for a new data set based on the existing model if the input columns for the new data set are provided. Select the BikeBuyer column as predictable by enabling the check box and select the columns Age, CommuteDistance, EnglishEducation, Gender, HouseOwnerFlag, MaritalStatus, NumberCarsOwned, NumberChildrenAtHome, Region, TotalChildren, and YearlyIncome as input columns, and click Next.
9) Make all of the Continuous content types Discrete except Yearly Income, and click Next.
10) In the Create Testing Set page, accept the default value of 30% for the percentage of data for testing, and click Next.
11) Provide appropriate names for the mining structure and mining model that will contain the structure. Check the Allow drill through to allow drill through while analyzing the model and click Finish to save.
This completes the creation of the data mining structure. In the next chapter we will develop and explore the data mining model.
- Consider exploring more about the Microsoft Decision Trees Data Mining Algorithm from here.