Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Explaining The SQL Server 2014 Analysis Services Data Mining Model Lift Chart


By:   |   Read Comments   |   Related Tips: > Analysis Services Development

Attend our free MSSQLTips Webcast - How to Simplify Routine SQL Server Administration Tasks


Problem

How do I interpret the Lift Chart found on the Mining Accuracy tab of a SQL Server 2014 Analysis Services Data Mining structure?

Solution

In this tip, we will examine different lift charts produced by the data mining models for three different sets of data. The first set of data, stored in tblLiftChart1, will classify perfectly with no false positives or false negatives. The second set of data, stored in tblLiftChart2, will cause the classification algorithm to predict with 50 percent accuracy. The third set of data, stored in tblLiftChart3, will cause the classification algorithm to generate 20 percent false positives and false negatives. Each table will have 20,000 rows and we will use the Neural Network algorithm to classify the data with 30 percent of the records held out for testing. Let's begin by creating three small tables with the following T-SQL Code.

--drop the tables for this example if they exist
IF EXISTS (SELECT name FROM sys.tables WHERE name = N'tblLiftChart1')
 drop table dbo.tblLiftChart1
go
IF EXISTS (SELECT name FROM sys.tables WHERE name = N'tblLiftChart2')
 drop table dbo.tblLiftChart2
go
IF EXISTS (SELECT name FROM sys.tables WHERE name = N'tblLiftChart3')
 drop table dbo.tblLiftChart3
go
create table dbo.tblLiftChart1
(
 pkLiftChart integer identity(1,1) not null Primary Key,
 x decimal(3,2) not null,
 y decimal(3,2) not null,
 class varchar(3) not null
)
create table dbo.tblLiftChart2
(
 pkLiftChart integer identity(1,1) not null Primary Key,
 x decimal(3,2) not null,
 y decimal(3,2) not null,
 class varchar(3) not null
)
create table dbo.tblLiftChart3
(
 pkLiftChart integer identity(1,1) not null Primary Key,
 x decimal(3,2) not null,
 y decimal(3,2) not null,
 class varchar(3) not null
)

Next, let's populate our sample tables and then select the row count of the tables.

SET NOCOUNT ON
GO
--Populate tblLiftChart1
declare @i integer=1
begin transaction
while @i<=10000
begin
  insert into dbo.tblLiftChart1 values (rand()+1.0, rand()+1.0,'YES')--100% true positive
  insert into dbo.tblLiftChart1 values (rand(), rand(), 'NO')        --100% true negative
  set @i=@i+1
end
commit
go

--Populate tblLiftChart2
declare @i integer=1
begin transaction
while @i<=5000
begin
  insert into dbo.tblLiftChart2 values (rand()+1.0, rand()+1.0,'YES')--50% true positive
  insert into dbo.tblLiftChart2 values (rand(), rand(), 'NO')        --50% true negative
  insert into dbo.tblLiftChart2 values (rand()+1.0, rand()+1.0,'NO') --50% false negative
  insert into dbo.tblLiftChart2 values (rand(), rand(), 'YES')       --50% false positive
  set @i=@i+1
end
commit
go

--Populate tblLiftChart3
declare @i integer=1
begin transaction
while @i<=8000
begin
  insert into dbo.tblLiftChart3 values (rand()+1.0, rand()+1.0,'YES')--80% true positive
  insert into dbo.tblLiftChart3 values (rand(), rand(), 'NO')        --80% true negative
  set @i=@i+1
end
commit
go

declare @i integer=1
begin transaction
while @i<=2000
begin
  insert into dbo.tblLiftChart3 values (rand()+1.0, rand()+1.0,'NO') --20% false negative
  insert into dbo.tblLiftChart3 values (rand(),rand(),'YES')         --20% false positive
  set @i=@i+1
end
commit

SELECT count(*) FROM dbo.tblLiftChart1
SELECT count(*) FROM dbo.tblLiftChart2
SELECT count(*) FROM dbo.tblLiftChart3

I created a SQL Server 2014 Analysis Services Multidimensional and Data Mining Project in Visual Studio. The Solution Explorer window for this project is shown below. There is one mining structure per table.

Solution Explorer

The three tables we just created are included in the data source view as shown below.

Data Source View

Each mining structure is similar to what is shown next.

Mining Structure

Each mining model is also similar with pkLiftChart as the primary key, X and Y are the input columns and Class is the predicted column.

Mining Model

After deploying and processing the data mining models, we open the LiftChart1 data mining object, click on the Mining Accuracy Chart tab and then the Classification Matrix tab. We see the Classification Matrix for the mining model built from the data in tblLiftChart1. The classification matrix is also known as a confusion matrix in some data mining texts. The classification matrix is built from the 30 percent of the data held out for testing. The sum of the counts in the matrix is 6,000 which is 30 percent of the 20,000 rows in tblLiftChart1. We see there are 2,991 true positives and 3,009 true negatives.

Classification matrix for LiftChart1

On the Input Selection tab, let's leave the Predict Value blank. This is very important because the lift chart behaves differently when there is a value selected in the Predict Value column.

Input selection for LiftChart1

Now let's click on the Lift Chart tab to display the Lift Chart and Mining Legend. The Mining Legend shows that two lines should be displayed, one for LiftChart1 and one for the Ideal Model. For a perfect classification model, as we have for LiftChart1, the Ideal Model is on top of the LiftChart1 line. The solid gray vertical bar can be clicked and moved horizontally to examine different values along the plotted line in the Mining Legend window. The Mining Legend window shows that for 50 percent of the overall population, 50 percent were correctly predicted with a 99.97 percent probability that the mining algorithm can predict correctly.

Lift Chart for LiftChart1 at 50 percent


Mining Legend for LiftChart1 at 50 percent

When we slide the gray vertical bar to the 80 percent overall population value, the Mining Legend window shows that for 80 percent of the overall population, 80 percent were correctly predicted with a 99.91 percent probability that the mining algorithm can predict correctly.

Lift Chart for LiftChart1 at 80 percent


Mining Legend for LiftChart1 at 80 percent

We don't live in a perfect world, so let's move to LiftChart2. While the numbers in the Classification Matrix are not equal, the value for sensitivity is 0.51 and specificity is 0.49 which is expected for the data in tblLiftChart2.

Classification matrix for the LiftChart2

Once again we will leave the Predict Value blank on the Input Selection tab.

Input selection for LiftChart2

When we click on the Lift Chart for the LiftChart2 mining model, we see the line for the LiftChart2 separated from the Ideal Model line. For a perfect classification model, as we have for LiftChart1, the Ideal Model is on top of the LiftChart1 line. For the LiftChart2 model, the Mining Legend window shows that for 50 percent of the overall population, 25 percent were correctly predicted with a 50.72 percent probability that the mining algorithm can predict correctly.

Lift Chart for LiftChart2 at 50 percent


Mining Legend for LiftChart2 at 50 percent

When we slide the gray vertical bar to the 80 percent overall population value, the Mining Legend window shows that for 80 percent of the overall population, 40 percent were correctly predicted with a 50.29 percent probability that the mining algorithm can predict correctly.

Lift Chart for LiftChart2 at 80 percent


Mining Legend for LiftChart2 at 80 percent

Now we will turn our attention to LiftChart3. The classification matrix shows approximately 80 percent of the data is a true positive or true negative while approximately 20 percent of the data is a false positive or false negative. This corresponds with the population of the source table we created above.

Classification matrix for the LiftChart3

Still leaving the Predict Value blank on the Input Selection tab.

Input selection for LiftChart3

We can see that the LiftChart3 line and the Ideal Model lines are still separated, but not as much as in the previous example. For the LiftChart3 model, the Mining Legend window shows that for 50 percent of the overall population, 40 percent were correctly predicted with a 99.98 percent probability that the mining algorithm can predict correctly.

Lift Chart for LiftChart3 at 50 percent


Mining Legend for LiftChart3 at 50 percent

When we slide the gray vertical bar to the 80 percent overall population value, the Mining Legend window shows that for 90 percent of the overall population, 72 percent were correctly predicted with a 99.77 percent probability that the mining algorithm can predict correctly.

Lift Chart for LiftChart3 at 80 percent


Mining Legend for LiftChart3 at 80 percent

Summary

When the Predict Value on the Input Selection tab is blank, a classification algorithm is deemed to be better as its Lift Chart line approaches the Ideal Model line. Please note that currently there is not a Receiver Operating Characteristic (ROC) curve plot in Analysis Services, so we have to use the Lift Chart to graphically represent the quality of the classifier.

Next Steps

Try variations in the percentage of false positives and false negatives. Also, check out these other tips on data mining in SQL Server Analysis Services.



Last Update:


signup button

next tip button



About the author
MSSQLTips author Dallas Snider Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Learn more about SQL Server tools