Basics of Machine Learning with SQL Server 2017 and R
By: Siddharth Mehta
The traditional BI methodology has focused on sourcing data from disparate source systems and augmenting the data in a data warehouse / data lake. This data repository acts as the master source for different purposes like data marts, reporting, and data mining. All these forms of data analysis requires the end user to apply analytical thinking to interpret the results. Machine Learning is an advanced form of analysis where the system / model learns from the data fed to the model and derives intelligence for predicting analysis. The analysis depends on the machine learning model development process that is composed of exploratory data analysis, data transformation / modeling, model development, model training, model testing and model improvisation.
Many professionals often think that their experience with databases already covers exploratory data analysis skills. Database professionals are fluent with data analysis which is more of a database model / query logic assessment. The exploratory data analysis involved in machine learning systems is statistical in nature and is often termed as data science.
Machine Learning has a deep root in statistics, which is required to build a solid foundation for basics of data science for exploratory data analysis. We can classify statistics in two broad categories descriptive and inferential, which is widely used in machine learning model development.
SQL Server hosted data has the advantage of a pre-defined schema and T-SQL constructs. ETL tools like SSIS offer the advantage to transform the data at a faster pace and broader scale. Assuming the data is properly structured and treated for errors during data capture / data quality, exploratory data analysis can be applied over this data, which is the basic step in machine learning model development. Model development, model training and model testing follow this analysis.
In this tutorial, we will learn the basics of machine learning where we will learn the necessary data science to examine data in relevance to machine learning model development. We will be using R in SQL Server 2017 to apply machine learning related techniques and analysis. In case you are new to R, you can get quickly get up to speed by following the R Tutorial here.
The structure of this tutorial is listed below.
- Machine Learning Terms and Concepts
- What is Machine Learning and the reason to learn it?
- Applications of Machine Learning
- Types of Machine Learning
- Machine Learning support in Microsoft Technology Stack
- Basics of Statistics
- Use of statistics in Machine Learning
- Branches of Statistics
- Statistics Terms and Concepts Population, Sample, Independent and Dependent variable, Probability Distribution
- How to study just enough statistics for Machine Learning
- Useful Links to reference material
- Basic Data Science and Exploratory Data Analysis with SQL Server 2017 and
- Measures of central tendency Mean, Median, Mode
- Measures of dispersion Range, Variance, Standard Deviation, Inter-quartile Range
- Measures of association Correlation, Covariance
- How to derive the above in SQL Server 2017 hosted data using R
- Basic Graphical Analysis for Machine Learning with SQL Server 2017 and R
- Kernel Density Plot
- Scatter Plot
- Box Plot
- Machine Learning Model Development Lifecycle with SQL Server 2017 and R
- Machine Learning model life cycle
- Linear Regression Model Case Study
- Model Development, Model Storage, Model Extraction, Model Usage
- Prediction and Model Testing
After reading the course agenda, I am sure you are excited to get started with the basics of machine learning. So let's start with the question of "What is machine learning?".