Microsoft Naïve Bayes Data Mining Model in SQL Server Analysis Services

By:   |   Comments (6)   |   Related: > Analysis Services Development


Problem

Many Data Mining or Machine Learning students have trouble making the transition from a Data Mining tool such as WEKA [1] to the data mining functionality in SQL Server Analysis Services.  Can you provide some assistance with this transition to Microsoft's Naive Bayes Data Mining algorithm?  Check out this tip to learn more.

Solution

The solution presented here takes a classic example of the Naive Bayes classification algorithm from Data Mining and Machine Learning seen in differing variations in textbooks by Quinlan [2], Mitchell [3], Han, Kamber and Pei [4], and the WEKA application. Also in this tip, a demonstration is given on how to use the SSAS Mining Model Prediction feature to predict the class label of an unknown object.

The first step is to create a table and load it with data using the T-SQL sample below.

IF EXISTS (SELECT name FROM sys.tables WHERE name = N'tblNaiveBayesExample')
    DROP TABLE dbo.tblNaiveBayesExample;
create table dbo.tblNaiveBayesExample 
(
  PKey integer identity(1,1) Primary Key,
  Outlook varchar(8) not null,
  Temperature varchar(4) not null,
  Humidity varchar(6) not null,
  Windy varchar(5) not null,
  Play varchar(3) not null
)
insert into dbo.tblNaiveBayesExample values ('sunny','hot','high','FALSE','no')
insert into dbo.tblNaiveBayesExample values ('sunny','hot','high','TRUE','no')
insert into dbo.tblNaiveBayesExample values ('overcast','hot','high','FALSE','yes')
insert into dbo.tblNaiveBayesExample values ('rainy','mild','high','FALSE','yes')
insert into dbo.tblNaiveBayesExample values ('rainy','cool','normal','FALSE','yes')
insert into dbo.tblNaiveBayesExample values ('rainy','cool','normal','TRUE','no')
insert into dbo.tblNaiveBayesExample values ('overcast','cool','normal','TRUE','yes')
insert into dbo.tblNaiveBayesExample values ('sunny','mild','high','FALSE','no')
insert into dbo.tblNaiveBayesExample values ('sunny','cool','normal','FALSE','yes')
insert into dbo.tblNaiveBayesExample values ('rainy','mild','normal','FALSE','yes')
insert into dbo.tblNaiveBayesExample values ('sunny','mild','normal','TRUE','yes')
insert into dbo.tblNaiveBayesExample values ('overcast','mild','high','TRUE','yes')
insert into dbo.tblNaiveBayesExample values ('overcast','hot','normal','FALSE','yes')
insert into dbo.tblNaiveBayesExample values ('rainy','mild','high','TRUE','no')
select * from dbo.tblNaiveBayesExample
go

The output from the T-SQL code is shown below. This data will be used as the input for the data mining algorithm.

The output from the T-SQL code is shown below

In Visual Studio (also known from the start menu as SQL Server Data Tools), create a new Analysis Services Multidimensional and Data Mining Project. In this tip, we will name the project Naive Bayes Example. Click on OK when finished with the New Project window.

create a new Analysis Services Multidimensional and Data Mining Projec

In the Solution Explorer window, right-click on the Data Sources folder and choose "New Data Source..." to initiate the Data Source Wizard. Click on "Next >".

right-click on the Data Sources folder and choose "New Data Source

Choose your data connection, if one exists. If a Data connection does not exist, click on "New..." to create a new data connection. In this example, we are using a connection to the Tips database on the localhost. Click on "Next >".

click on "New..." to create a new data connection

On the Impersonation Information screen, click on "Use a specific Windows user name and password." Enter your username and password. Click on "Next >".

click on "Use a specific Windows user name and password."

On the Completing the Wizard screen, the data source name can be changed if desired. Click on "Finish".

the data source name can be changed if desired

In the Solution Explorer window, right-click on the Data Source Views folder and choose "New Data Source View..." to launch the Data Source View Wizard. Click on "Next >".

choose "New Data Source View..."

On the Select a Data Source page in the Relational data sources window, select the data source we created in the above step. Click on "Next >".

select the data source we created in the above step. Click on "Next >".

On the Name Matching page, check the box for "Create logical relationships by matching columns." In the Foreign key matches box, press the "Same name as primary key" button. Click on "Next >".

check the box for "Create logical relationships by matching columns."

On the Select Tables and Views page, move tblNaiveBayesExample from the Available Objects box to the Included object box by selecting tblNaiveBayesExample in the Available objects box and then clicking on the ">" box. Click on "Next >".

move tblNaiveBayesExample from the Available Objects box to the Included object box by selecting tblNaiveBayesExample in the Available objects box

On the Completing the Wizard page, give the Data Source View a name and click on "Finish".

give the Data Source View a name and click on "Finish".

The Solution Explorer should appear as it does below with one Data Source and one Data Source View defined.

The Solution Explorer should appear as it does below with one Data Source and one Data Source View defined.

In the Solution Explorer window, right-click on the Mining Structures folder and choose "New Mining Structure..." to launch the Data Mining Wizard. Click on "Next >".

right-click on the Mining Structures folder and choose "New Mining Structure..."

On the Select the Definition Method page, press the radio button labeled "From existing relational database or data warehouse". Click on "Next >".

press the radio button labeled "From existing relational database or data warehouse".

On the Create the Data Mining Structure page, press the radio button labeled "Create mining structure with a mining model". Choose the "Microsoft Naive Bayes" data mining technique from the drop-down box.

press the radio button labeled "Create mining structure with a mining model".

On the Select Data Source View page, choose "Tips" from the Available data source views. Please note this is the data source view we created earlier. Click on "Next >".

the Select Data Source View page

On the Specify Table Types page, make sure the Case box is checked and the Nested box is unchecked for the table named tblNaiveBayesExample.

make sure the Case box is checked

On the Specify the Training Data page, in the Key column check the box that corresponds to the PKey column. In the Input column, check the boxes for the Humidity, Outlook, Temperature, and Windy columns. These input columns will be used as input to the Naive Bayes algorithm. In the Predictable column, check the box for the Play column which is the column the Naive Bayes algorithm will attempt to predict. Click on "Next >".

In the Input column, check the boxes for the Humidity, Outlook, Temperature, and Windy columns.

On the Specify Columns' Content and Data Type page, we see the columns to be used in the mining model structure, along with their content and data types. Click on "Next >".

the Specify Columns' Content and Data Type page

On the Create Testing Set page, we will set the "Percentage of data for testing" and "Maximum number of cases in testing data set" to zero for this example. Please note that there needs to be a set of data reserved for testing or use 10-fold cross validation to prevent overfitting the data mining model to the training data. Click on "Next >".

we will set the "Percentage of data for testing" and "Maximum number of cases in testing data set" to zero for this example

On the Completing the Wizard page, the name of the mining structure and model can be changed. Click on "Finish".

On the Completing the Wizard page, the name of the mining structure and model can be changed.

The image below shows the Visual Studio Solution Explorer window and the Mining Structure tab of the data mining model that was created in the steps above.

the Visual Studio Solution Explorer window and the Mining Structure tab of the data mining model that was created

The next image from Visual Studio shows the Mining Model tab of the data mining model that was created in the steps above.

the Mining Model tab of the data mining model that was created

Under the Mining Model tab, right click on the box labeled "Microsoft_Naive_Bayes" to bring up the menu shown below and select "Set Algorithm Parameters..." to display the Algorithm Parameters window.

Under the Mining Model tab, right click on the box labeled "Microsoft_Naive_Bayes"

In the Algorithm Parameters window, set the MINIMUM_DEPENDENCY_PROBABILITY 0.1. The description box explains the meanings of the parameters and their values. Click on OK.

In the Algorithm Parameters window, set the MINIMUM_DEPENDENCY_PROBABILITY 0.1

Next, click on the Mining Model Viewer tab. If you get the message shown below about the server content appearing to be out of date and asking "Would you like to build and deploy the project first?", click on Yes.

click on the Mining Model Viewer tab

If you get the message shown below about the amount of time it will take to process the mining model, click on "Yes" to continue because we only have 14 records.

click on "Yes" to continue because we only have 14 records.

When the Process Mining Model - Naive Bayes Example window pops up, leave the default settings and click on Run...

When the Process Mining Model - Naive Bayes Example window pops up, leave the default settings and click on Run...

The Process Progress window will display. When the process successfully completes as shown below, click on Close to return focus to the Process Mining Model window, and then click on Close to return focus to the Mining Model Viewer window.

The Process Progress window will display

The Mining Model Viewer window's Dependency Network tab shows the dependencies among the attributes. The legend at the bottom explains the color coding of the graph. Clicking on the Play node shows that the attributes, Outlook, Windy and Humidity are used to predict Play.

The Mining Model Viewer window's Dependency Network tab shows the dependencies among the attributes.

If we move the vertical slider on the far left from All Links to Strongest Links, we will see that the link between Humidity and Play is the strongest.

see that the link between Humidity and Play is the strongest.

Since Naive Bayes uses probabilities to predict an object's classification, we can use the Attribute Characteristics tab to see the strength of each attributes' values in predicting the class value. Based on the Strongest Links in the image above, we expect Humidity to be the strongest. When we click on the Attribute Characteristics tab for the attribute Play and for both values "yes" and "no", we see the Humidity attribute to be the strongest.

we can use the Attribute Characteristics tab to see the strength of each attributes' values in predicting the class value.


click on the Attribute Characteristics tab for the attribute Play and for both values "yes" and "no", we see the Humidity attribute to be the strongest.

The final step of this tip is to predict an unknown class value based on a new set of attribute values. Let's click on the Mining Model Prediction tab.

Let's click on the Mining Model Prediction tab

Click on the "Select Case Table..." button to display the Select Table window. Select tblNaiveBayesExample and then click on OK.

Click on the "Select Case Table..." button to display the Select Table window

We can see where the mining model columns map to the input table columns.

We can see where the mining model columns map to the input table columns

Next, right click on the table name in the Select Input Table(s) box and choose Singleton Query. This will allow us to select or enter values for a query against the data mining model to predict an unknown class label.

Next, right click on the table name in the Select Input Table(s) box and choose Singleton Query

We will select normal humidity, sunny outlook, mild temperature and not windy in the Singleton Query Input box. In the middle of the window, we will select "Prediction Function" in the Source box, Predict in the Field box, check the Show box, and drag the Play attribute from the Mining Model box to the Criteria/Argument box as shown below. On the next line, we will select Naive Bayes Example in the Source box, and the Play attribute in the Field box.

We will select normal humidity, sunny outlook, mild temperature and not windy in the Singleton Query Input box.

Now we are ready to see the predicted class value for the Play attribute. Let's select Result and the predicted value for the Play attribute is Yes.

Now we are ready to see the predicted class value for the Play attribute.


Let's select Result and the predicted value for the Play attribute is Yes.
Next Steps
References

[1] Witten, I.H., E. Frank, E., Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed., Morgan Kaufmann Publishers, 2005.

[2] Quinlan, R., C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.

[3] Mitchell, T., Machine Learning, McGraw Hill, 1997.

[4] Han, J., M. Kamber and J. Pei, Data Mining: Concepts and Techniques, Elsevier, 2012.



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Dallas Snider Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Saturday, March 4, 2017 - 5:40:36 PM - Alaa Back To Top (47199)

selection Result at the final step i get error message, what do you think ?


Monday, November 30, 2015 - 7:35:12 AM - Richard Holy Back To Top (40156)

Could anyone explain how to calculate "Likelihood" and "Probability" of the calculated class in SQL server analysis? Also if you have a multinominal output of predicted column (more than 2 classes/states), can you explain how to display all the possible values with it's probability? I want to achieve "top 10 recommended values" sorted by probability.


Monday, July 7, 2014 - 3:36:27 PM - Jeannine Back To Top (32580)

Nice to see a walkthrough using a known sample dataset -- helps to interpret the model. And nice demo of the prediction query builder!


Tuesday, April 8, 2014 - 4:08:59 AM - elahe Back To Top (29999)

hello

i am from iran-mashhad and i verry happy to find your help.

tank you verry mach


Sunday, March 23, 2014 - 10:10:07 PM - mayy Back To Top (29840)

i have error

to connect to the (local)\SQL EXPRESS server. Make sure that the server is started.

when click on the Mining Model Viewer


Wednesday, January 29, 2014 - 2:03:07 AM - pradeep Back To Top (29270)

This is very much great and hope fully nice blog. Every body can easily found her need able information. I am visit first time but I fond many use full article. I will back again when get time.















get free sql tips
agree to terms