Overview
The traditional BI methodology has focused on sourcing data from disparate source systems and augmenting the data in a data warehouse/data lake. This data repository acts as the master source for different purposes, like data marts, reporting, and data mining. All these forms of data analysis require the end user to apply analytical thinking to interpret the results. Machine Learning is an advanced form of analysis where the system/model learns from the data fed to the model and derives intelligence for predicting analysis. The analysis depends on the machine learning model development process that is composed of exploratory data analysis, data transformation/modeling, model development, model training, model testing, and model improvisation.
Explanation
Many professionals often think that their experience with databases already covers exploratory data analysis skills. Database professionals are fluent with data analysis, which is more of a database model/query logic assessment. The exploratory data analysis involved in machine learning systems is statistical in nature and is often termed as data science.
Machine Learning has a deep root in statistics, which is required to build a solid foundation for basics of data science for exploratory data analysis. We can classify statistics in two broad categories – descriptive and inferential, which is widely used in machine learning model development.
SQL Server hosted data has the advantage of a pre-defined schema and T-SQL constructs. ETL tools like SSIS offer the advantage to transform the data at a faster pace and broader scale. Assuming the data is properly structured and treated for errors during data capture / data quality, exploratory data analysis can be applied over this data, which is the basic step in machine learning model development. Model development, model training and model testing follow this analysis.
In this tutorial, we will learn the basics of machine learning where we will learn the necessary data science to examine data in relevance to machine learning model development. We will be using R in SQL Server 2017 to apply machine learning related techniques and analysis. In case you are new to R, you can get quickly up to speed by following the R Tutorial here.
The structure of this tutorial is listed below.
- Machine Learning Terms and Concepts
- What is Machine Learning and the reason to learn it?
- Applications of Machine Learning
- Types of Machine Learning
- Machine Learning support in Microsoft Technology Stack
- Basics of Statistics
- Use of statistics in Machine Learning
- Branches of Statistics
- Statistics Terms and Concepts – Population, Sample, Independent and Dependent variable, Probability Distribution
- How to study just enough statistics for Machine Learning
- Useful Links to reference material
- Basic Data Science and Exploratory Data Analysis with SQL Server 2017 and R
- Measures of central tendency – Mean, Median, Mode
- Measures of dispersion – Range, Variance, Standard Deviation, Inter-quartile Range
- Measures of association – Correlation, Covariance
- How to derive the above in SQL Server 2017 hosted data using R
- Basic Graphical Analysis for Machine Learning with SQL Server 2017 and R
- Histogram
- Kernel Density Plot
- Scatter Plot
- Box Plot
- Machine Learning Model Development Lifecycle with SQL Server 2017 and R
- Machine Learning model life cycle
- Linear Regression Model Case Study
- Model Development, Model Storage, Model Extraction, Model Usage
- Prediction and Model Testing
After reading the course agenda, I am sure you are excited to get started with the basics of machine learning. So let’s start with the question of “What is machine learning?”.
Siddharth has more than 14 years of experience in the IT Industry, with more than a decade of experience in Business Intelligence and Analytics, for clients banking, logistics, government, Media Entertainment, products, life sciences and other domains. He has been a lead architect for a portfolio of 40+ apps, containing apps in web, mobile, BI, Analytics, data warehousing, reporting, collaboration, CMS, NoSQL and other technologies. He has several certifications and is a published author for online and print-media publications, as well as the MSDN Library.
In his present role, he remains responsible for architecture design, technology stack selection, infrastructure design, 3rd party products evaluation and procurement, and performance engineering. These applications use technologies like Elasticsearch / Lucene, MongoDB, SharePoint 2013 and 2010, jQuery-based framework like Highcharts and GoJS, SQL Server and the Microsoft Business Intelligence stack (SSIS, SSAS, SSRS, MDX, PowerPivot, PowerView), jQueryMobile, Bootstrap, iOS xCode framework, and many others.
- MSSQLTips Awards: Champion (100+ tips) – 2018 | Author of the Year – 2017 | Author Contender – 2016, 2018-2019