Data Mining Algorithms Overview
By: Siddharth Mehta
The Data Mining process involves use of different algorithms on the dataset to analyze patterns in data and make predictions. SQL Server Analysis Services comes with data mining capabilities which contains a number of algorithms. These algorithms can be categorized by the purpose served by the mining model. Below are some of the different categories of the data mining model and the corresponding algorithms available in SSAS to support the purpose. Having this information is useful before starting the implementation of the actual data mining model which we will start in the next chapter.
Below are the categories and some of the typical applications of the data mining algorithms.
Classification: Predicting a discrete attribute
- Flag the customers in a prospective buyers list as good or poor prospects.
- Calculate the probability that a server will fail within the next 6 months.
- Categorize patient outcomes and explore related factors.
- Supporting Algorithms: Microsoft Decision Trees, Microsoft Naive Bayes, Microsoft Clustering, Microsoft Neural Network.
Regression: Predicting a continuous attribute
- Forecast next year's sales.
- Predict site visitors given past historical and seasonal trends.
- Generate a risk score given demographics.
- Supporting Algorithms: Microsoft Decision Trees, Microsoft Time Series, Microsoft Linear Regression Algorithm
Sequence Analysis: Predicting a sequence
- Perform clickstream analysis of a company's Web site.
- Analyze the factors leading to server failure.
- Capture and analyze sequences of activities during outpatient visits, to formulate best practices around common activities.
- Supporting Algorithms: Microsoft Sequence Clustering
Association: Finding groups of common items in transactions
- Use market basket analysis to determine product placement.
- Suggest additional products to a customer for purchase.
- Analyze survey data from visitors to an event, to find which activities or booths were correlated, to plan future activities.
- Supporting Algorithms: Microsoft Association, Microsoft Decision Trees
Segmentation: Finding groups of similar items
- Create patient risk profiles groups based on attributes such as demographics and behaviors.
- Analyze users by browsing and buying patterns.
- Identify servers that have similar usage characteristics.
- Supporting Algorithms: Microsoft Clustering, Microsoft Sequence Clustering
To move forward with the tutorial, you will need a complete installation of SSAS along with a sample SSAS OLAP database installed on the SSAS instance. Install and configure on your machine before you start the next chapter.
- Consider installing and configuring SSAS in Multidimensional and Data Mining mode. You can read more about this here.
- Download the sample Adventure Works 2014 OLAP database back-up and restore on the SSAS instance. You can read more about the restore process here.