SQL Server 2017 and Python Basics


Python is a widely used high-level interpreted programming language for general-purpose programming. Python has a design philosophy that emphasizes code readability and a syntax that allows programmers to express concepts in fewer lines of code than might be used in languages such as C++ or Java.

Some of the widely used python based libraries are Numpy, Pandas, Scikit and others. Scikit is a machine learning library for the Python programming language. NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with relational or labeled data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Python has a community based development model, and it contains many powerful data processing libraries like R. Python and R are the two of the most widely used programming languages for data science.

SQL Server Machine Learning Services installs an open source distribution of Python, as well as packages provided by Microsoft that support distributed and/or parallel processing. The architecture is designed such that external scripts using Python run in a separate process from SQL Server. Machine Learning Services integrates the Python language with SQL Server, which helps in performing analytics close to the data and eliminate the costs and security risks associated with data movement.

The traditional data analytics methodology relies on transporting and transforming data from OLTP databases > Data Warehouses > Data Marts using PowerShell for administration, SQL Server Integration Services (SSIS) for ETL, SQL Server Analysis Services (SSAS) for multi-dimensional / in-memory analytics, and SQL Server Reporting Services (SSRS) for reporting. Data manipulation using set based operations and mathematical algebra has been the best possible solution with T-SQL on data stored in OLTP databases. Using Python with T-SQL extends the power of data science, statistical computing, machine learning and other advanced predictive analytics capabilities to OLTP systems.

This tutorial is intended to help experienced T-SQL Developers, DBAs, Data Analysts and Data Science enthusiasts to start using Python language with T-SQL. This enables data science and analytics tasks closest to the actual data, without the need to follow a traditional BI methodology of transporting and transforming data across repositories. The tutorial is structured in five lessons, with each lesson focused on explaining corresponding key points as mentioned below.

Lesson 1: Python in the SQL Server Ecosystem

  • What Python in SQL Server means for developers, DBAs, data analysts and data scientists
  • Python with T-SQL compared to Python versus T-SQL
  • Applications of R in SQL Server
  • Python tools in the SQL Server Ecosystem
  • Python and SQL Server Database Engine Integration Architecture

Lesson 2: Installing Machine Learning Services

  • Install SQL Server 2017 RC2 with Machine Learning Services
  • Install VS 2017 with Python tools
  • Explore Python installation and tools

Lesson 3: Basic Python Concepts

  • Python version, Libraries, Datasets
  • Variables, Comments, Printing Data
  • Arithmetic, Operators, Loops

Lesson 4: Python with T-SQL

  • Using sp_execute_external_scripts
  • Reading data in Python from T-SQL
  • Returning manipulated data from Python to T-SQL

Lesson 5: Data Analysis with Python

  • Graphical analysis with Python
  • Using Python scripts in a stored procedure
  • Useful Resources

By the end of this tutorial, you should be able to develop basic Python scripts that read data from OLTP databases and apply graphical analysis. So, let's get started with the first lesson to understand the influence of Python in SQL Server and its impact on the SQL Server community.

Last Update: 9/18/2017

More SQL Server Solutions

Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

Email me updates

Signup for our newsletter

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.

Tuesday, May 15, 2018 - 2:01:29 PM - Said Back To Top



Can I ran .sql file from python. I don't need to embed the sql code into python file. I need the python file to call out the sql file and execute it. Thanks. 


Sunday, January 14, 2018 - 7:09:07 AM - Rich Back To Top

 Hello, I have worked with MS SQL for many years but am new to Python. This is a great tip, but I am unable to reproduce the scatter plots shown. I have downlloaded the AdventureWorksDW DB and restored in SQL Server 2017 with imbedded Python (and R). Using the code to create MyPythonTestData I came up with 5347 rows from the 60xxx rows in the Factxxx tables. Running the three charting samples the resulting plots are much more linear and the three outliers circled are not present. Code presented was use exactly as shown and the three files created, just look different. Don't know enough about Python to understand why the difference. Any ideas?

Thanks for all the great tips.




get free sql tips

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.

Learn more about SQL Server tools