SQL Server 2017 and Python Basics

Overview

Python is a widely used high-level interpreted programming language for general-purpose programming. Python has a design philosophy that emphasizes code readability and a syntax that allows programmers to express concepts in fewer lines of code than might be used in languages such as C++ or Java.

Explanation

Some of the widely used python based libraries are Numpy, Pandas, Scikit and others. Scikit is a machine learning library for the Python programming language. NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with relational or labeled data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Python has a community based development model, and it contains many powerful data processing libraries like R. Python and R are the two of the most widely used programming languages for data science.

SQL Server Machine Learning Services installs an open source distribution of Python, as well as packages provided by Microsoft that support distributed and/or parallel processing. The architecture is designed such that external scripts using Python run in a separate process from SQL Server. Machine Learning Services integrates the Python language with SQL Server, which helps in performing analytics close to the data and eliminate the costs and security risks associated with data movement.

The traditional data analytics methodology relies on transporting and transforming data from OLTP databases > Data Warehouses > Data Marts using PowerShell for administration, SQL Server Integration Services (SSIS) for ETL, SQL Server Analysis Services (SSAS) for multi-dimensional / in-memory analytics, and SQL Server Reporting Services (SSRS) for reporting. Data manipulation using set based operations and mathematical algebra has been the best possible solution with T-SQL on data stored in OLTP databases. Using Python with T-SQL extends the power of data science, statistical computing, machine learning and other advanced predictive analytics capabilities to OLTP systems.

This tutorial is intended to help experienced T-SQL Developers, DBAs, Data Analysts and Data Science enthusiasts to start using Python language with T-SQL. This enables data science and analytics tasks closest to the actual data, without the need to follow a traditional BI methodology of transporting and transforming data across repositories. The tutorial is structured in five lessons, with each lesson focused on explaining corresponding key points as mentioned below.

Lesson 1: Python in the SQL Server Ecosystem

What Python in SQL Server means for developers, DBAs, data analysts and data scientists
Python with T-SQL compared to Python versus T-SQL
Applications of R in SQL Server
Python tools in the SQL Server Ecosystem
Python and SQL Server Database Engine Integration Architecture

Lesson 2: Installing Machine Learning Services

Install SQL Server 2017 RC2 with Machine Learning Services
Install VS 2017 with Python tools
Explore Python installation and tools

Lesson 3: Basic Python Concepts

Python version, Libraries, Datasets
Variables, Comments, Printing Data
Arithmetic, Operators, Loops

Lesson 4: Python with T-SQL

Using sp_execute_external_scripts
Reading data in Python from T-SQL
Returning manipulated data from Python to T-SQL

Lesson 5: Data Analysis with Python

Graphical analysis with Python
Using Python scripts in a stored procedure
Useful Resources

By the end of this tutorial, you should be able to develop basic Python scripts that read data from OLTP databases and apply graphical analysis. So, let’s get started with the first lesson to understand the influence of Python in SQL Server and its impact on the SQL Server community.

Siddharth Mehta

Siddharth has more than 14 years of experience in the IT Industry, with more than a decade of experience in Business Intelligence and Analytics, for clients banking, logistics, government, Media Entertainment, products, life sciences and other domains. He has been a lead architect for a portfolio of 40+ apps, containing apps in web, mobile, BI, Analytics, data warehousing, reporting, collaboration, CMS, NoSQL and other technologies. He has several certifications and is a published author for online and print-media publications, as well as the MSDN Library.

In his present role, he remains responsible for architecture design, technology stack selection, infrastructure design, 3rd party products evaluation and procurement, and performance engineering. These applications use technologies like Elasticsearch / Lucene, MongoDB, SharePoint 2013 and 2010, jQuery-based framework like Highcharts and GoJS, SQL Server and the Microsoft Business Intelligence stack (SSIS, SSAS, SSRS, MDX, PowerPivot, PowerView), jQueryMobile, Bootstrap, iOS xCode framework, and many others.

MSSQLTips Awards: Champion (100+ tips) – 2018 | Author of the Year – 2017 | Author Contender – 2016, 2018-2019

4 Comments

Santelices
August 14, 2025 / 4:08 pm Reply
Can I get this tutorial?
Elvin Reyes
May 12, 2025 / 11:20 am Reply
Favor enviarme mas información acerca de este tutorial
John
October 21, 2024 / 8:05 am Reply
How can I get this so that a can read offline
keerthi ravichandran
July 29, 2022 / 5:06 am Reply
Well-written and very informative. There were many things I learned about Python and Sql. Thanks for sharing this article and its really useful for me.

4 Comments

Leave a ReplyCancel Reply