Interpreting the SQL Server Analysis Services Classification Matrix
The Classification Matrix is found in the Mining Accuracy Chart section of the SQL Server Analysis Services Mining Structure object within Visual Studio. Classification matrices are also known as confusion matrices because they are showing the confusion a classification algorithm has in determining which class label to place on a given record. What do the numbers in the SQL Server Analysis Services Classification Matrix represent?
In this tip, the numbers in the classification matrix for a binary classifier are explained.
Binary classifiers attempt to categorize data into one of two categories such as True/False, Yes/No, Y/N or Positive/Negative. The numbers in the classification matrix show the counts of true positives, true negatives, false positives and false negatives. For this tip, we will use the class labels of "positive" and "negative" to help simplify the explanation.
The classification matrix in SQL Server Analysis Services lists the predicted values on the rows with the actual values in the columns. In the figure below, the true positive count is highlighted. True positives are when the predicted positive class label equals the actual positive class label. In other words, the algorithm has made a correct prediction that a particular tuple is positive.
The image below highlights the true negative count. True negatives are when the predicted negative class label equals the actual negative class label. In other words, the algorithm has made a correct prediction that a particular tuple is negative.
The next two figures get a little interesting because we are highlighting the counts for when the algorithm has made an incorrect prediction. The image below shows the false negative count. The algorithm has predicted 221 negative tuples while the actual data is positive.
This same algorithm has made 204 false positive predictions. False positives are when the predicted value is positive, but the actual value is negative.
False positives and false negatives do occur. The goal of classification is to maximize true positives and true negative counts while not causing the model to overfit the data to the point where the model cannot accurately predict previously unseen records.
Check out these other tips on data mining in SQL Server Analysis Services.
- SQL Server 2012 Analysis Services Association Rules Data Mining Example
- Explaining the Calculations of Probability and Importance for Complex Association Rules in SQL Server 2012 Analysis Services
- Classic Machine Learning Example In SQL Server Analysis Services
- Microsoft Naïve Bayes Data Mining Model in SQL Server Analysis Services
- Data Mining Clustering Example in SQL Server Analysis Services SSAS
- SQL Server Analysis Services Glossary
About the author
View all my tips