SQL Server 2017 and Python Basics


By:
Overview

Python is a widely used high-level interpreted programming language for general-purpose programming. Python has a design philosophy that emphasizes code readability and a syntax that allows programmers to express concepts in fewer lines of code than might be used in languages such as C++ or Java.

Some of the widely used python based libraries are Numpy, Pandas, Scikit and others. Scikit is a machine learning library for the Python programming language. NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with relational or labeled data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Python has a community based development model, and it contains many powerful data processing libraries like R. Python and R are the two of the most widely used programming languages for data science.

SQL Server Machine Learning Services installs an open source distribution of Python, as well as packages provided by Microsoft that support distributed and/or parallel processing. The architecture is designed such that external scripts using Python run in a separate process from SQL Server. Machine Learning Services integrates the Python language with SQL Server, which helps in performing analytics close to the data and eliminate the costs and security risks associated with data movement.

The traditional data analytics methodology relies on transporting and transforming data from OLTP databases > Data Warehouses > Data Marts using PowerShell for administration, SQL Server Integration Services (SSIS) for ETL, SQL Server Analysis Services (SSAS) for multi-dimensional / in-memory analytics, and SQL Server Reporting Services (SSRS) for reporting. Data manipulation using set based operations and mathematical algebra has been the best possible solution with T-SQL on data stored in OLTP databases. Using Python with T-SQL extends the power of data science, statistical computing, machine learning and other advanced predictive analytics capabilities to OLTP systems.

This tutorial is intended to help experienced T-SQL Developers, DBAs, Data Analysts and Data Science enthusiasts to start using Python language with T-SQL. This enables data science and analytics tasks closest to the actual data, without the need to follow a traditional BI methodology of transporting and transforming data across repositories. The tutorial is structured in five lessons, with each lesson focused on explaining corresponding key points as mentioned below.

Lesson 1: Python in the SQL Server Ecosystem

  • What Python in SQL Server means for developers, DBAs, data analysts and data scientists
  • Python with T-SQL compared to Python versus T-SQL
  • Applications of R in SQL Server
  • Python tools in the SQL Server Ecosystem
  • Python and SQL Server Database Engine Integration Architecture

Lesson 2: Installing Machine Learning Services

  • Install SQL Server 2017 RC2 with Machine Learning Services
  • Install VS 2017 with Python tools
  • Explore Python installation and tools

Lesson 3: Basic Python Concepts

  • Python version, Libraries, Datasets
  • Variables, Comments, Printing Data
  • Arithmetic, Operators, Loops

Lesson 4: Python with T-SQL

  • Using sp_execute_external_scripts
  • Reading data in Python from T-SQL
  • Returning manipulated data from Python to T-SQL

Lesson 5: Data Analysis with Python

  • Graphical analysis with Python
  • Using Python scripts in a stored procedure
  • Useful Resources

By the end of this tutorial, you should be able to develop basic Python scripts that read data from OLTP databases and apply graphical analysis. So, let's get started with the first lesson to understand the influence of Python in SQL Server and its impact on the SQL Server community.






Comments For This Article




Friday, July 29, 2022 - 5:06:11 AM - keerthi ravichandran Back To Top (90319)
Well-written and very informative. There were many things I learned about Python and Sql. Thanks for sharing this article and its really useful for me.

Thursday, April 16, 2020 - 10:27:21 PM - Nai Biao Zhou Back To Top (85394)

##Yes, we can run .sql file from python. We just need to read the sql file into the a string. We also need to check the string and make sure all SQL syntax are correct,

## I tested the following codes successfully. 

import pyodbc

def CreateConnection():
    conn = pyodbc.connect(
        "Driver={SQL Server};"
        "Server=.;"
        "Database=AdventureWorksDW2017;"
        "Trusted_Connection=Yes"
        )
    return conn

def CloseConnection(conn):
    conn.close()

def readSQLQuery(filePath):
    'reads sql query from a file path\n\
     return strings'
    dataFile = open(filePath, "r")
    strQuery = ''.join(dataFile.read().splitlines())#remove \n
    dataFile.close()
    return strQuery

def Read(conn):
    sqlQuery = readSQLQuery("SQLQuery.sql")
    cursor = conn.cursor()
    cursor.execute(sqlQuery)
    for row in cursor:
        print(row)


def main():
    conn = CreateConnection()
    cursor = Read(conn)
    CloseConnection(conn)

if __name__ == '__main__':
    main()

 


Tuesday, May 15, 2018 - 2:01:29 PM - Said Back To Top (75948)

Hi,

 

Can I ran .sql file from python. I don't need to embed the sql code into python file. I need the python file to call out the sql file and execute it. Thanks. 

 


Sunday, January 14, 2018 - 7:09:07 AM - Rich Back To Top (74954)

 Hello, I have worked with MS SQL for many years but am new to Python. This is a great tip, but I am unable to reproduce the scatter plots shown. I have downlloaded the AdventureWorksDW DB and restored in SQL Server 2017 with imbedded Python (and R). Using the code to create MyPythonTestData I came up with 5347 rows from the 60xxx rows in the Factxxx tables. Running the three charting samples the resulting plots are much more linear and the three outliers circled are not present. Code presented was use exactly as shown and the three files created, just look different. Don't know enough about Python to understand why the difference. Any ideas?

Thanks for all the great tips.

Rich

 















get free sql tips
agree to terms