Deep dive into Python scripts execution in SQL Server
By: Maria Zakourdaev | Updated: 2018-08-30 | Comments (1) | Related: More > Python
The SQL Server Launchpad service was introduced in SQL Server 2016, as an enabler of the “SQL Server R Services” feature. Since SQL Server 2017, the feature is called “SQL Server Machine Learning Services” and is used to easily develop and deploy solutions using R and Python 3.5 languages. In this tip I will focus on Python, however most of the configurations for R are quite similar.
As a huge fan of the Python language, I wanted to understand how Python support was implemented.
I have used Python procedures/functions in the AWS Redshift database and its implementation is very limited. You can write functions in Python, but you are limited to function scope and not allowed to add more Python libraries. Microsoft's implementation is much better from my point of view.
Moreover, I think the team that designed and added Python support had no idea how many tools and features they are adding. You are now able to connect to online resources, like for instance Forex Online Trading and get REALTIME conversion rates. You can do almost everything by using the Python rest API library (Rest API is a universal language that allows CRUD operations via the HTTP protocol). You can access and query any No-SQL data store and return the resultset to SQL Server client transparently.
Let’s explore how Microsoft implemented Python language support, find out which parameters can be changed and where and how we can troubleshoot the SQL Launchpad service during external script executions.
Script Execution Architecture
Microsoft is using Anaconda Python distribution (Python 3.5).
Python scripts are being launched externally to SQL Server, as a separate Windows processes (which is quite different from SQLCLR implementation, which is loaded inside the SQL Engine), so Python processes do not interfere with database transactions.
We need to enable external scripts execution:
EXEC sp_configure 'external scripts enabled',1
Now we can start running any Python script like this:
exec sp_execute_external_script @language =N'Python', @script=N'print ("Hello World")'
Launchpad.exe is starting python.exe runtime and transmitting the message with the script from sqlserver.exe to the Python.exe to be executed.
Microsoft has developed Python package RevoScalePy, which is spinning up BxlServer.exe processes which are used to efficiently transfer data from Python.exe, for instance the script final resultset or error messages using sqlsattelite API.
RevoScalePy package allows Python to process any size of dataset, even if it is bigger than the available server memory, efficiently moving data to and from temporary disk space.
In addition, RevoScalePy allows parallel data processing to leverage all server CPUs (Enterprise edition only feature).
Let’s drill down to the details
First, we will open the Launchpad service from the Services application. The Launchpad service runs under its own user account, which can be different from the one that the SQL Server service is using. This service depends on the SQL Server service, exactly like the SQL Server Agent service used to be. I was really surprised to see that in the SQL Server 2017 SQL Server Agent can run without the SQL Server service.
Let’s see the SQL Launchpad Service execution command and its parameters:
If we look closer at the command line, we will see all command line parameters (this is how it looks on my server, on your server it might be slightly different, depending on which SQL Server version was installed):
“C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\launchpad.exe"
- launcher RLauncher.dll
- pipename sqlsatellitelaunch
- timeout 600000
- logPath "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\LOG\ExtensibilityLog"
- workingDir "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityData"
- externalLibrariesTempDir "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\ExternalLibrariesTemp"
- externalLibrariesDir "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\ExternalLibraries"
- satelliteDllPath "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn\sqlsatellite.dll"
As you can see, launchpad.exe and Pythonlauncher.dll can be found in the same directory as sqlserver.exe and sqlagent.exe. SQL Launchpad loads Python and R dlls when starts.
Note that the path to sqlsattelite.dll is also located in the same directory.
You can also find here the path to the log file <logPath> which you can use to troubleshoot the service behavior. Here is it, pythonlauncher.log:
However, there are no error messages for failed scripts in the pythonlauncher.log file. In the SQL Server Error Log you will see Python script failures, but you will have only:
The <timeout> property defines when the launchpad service shuts down the external runtimes that it has launched.
Pythonlauncher.dll is using the configuration file pythonlauncher.xml, you can also find it in the same directory. It is used to set the external runtime configurations.
Let’s take a look at the pythonlauncher.xml:
Here we have the <PYTHONHOME> parameter, path to the Python runtime. Similarly, in the same location inside SCRIPTS subfolder, you can find the pip command. Yes, the pip command which is used to add external libraries to use in the Python scripts! This is a great feature, I have experimented with many interesting Python packages, you can take a look, for instance, at my tip on how to use Slack inside Python for alert purposes. I will soon post another tip which will show how I offload the big data exploration to NoSQL datastores, like Elasticsearch transparently and return final resultset back to SQL Server clients.
The <TRACE_LEVEL> parameter in the pythonlanucher.xml is used to change the verbosity of the service logging using the file pythonlauncher.log (log file path we saw is being defined as one of the parameters in the server command string -logPath "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\LOG\ExtensibilityLog”). Logging can go between 1 - error and 4 - information. You will see lots of internal information in the log file, especially if you change the verbosity level to 4.
To support user isolation during the Python scripts execution, there are 20 Windows accounts that were created by the installation wizard. Two different sql server connections will get external processes running under different windows accounts. All processes from the single user connection pool will be executed under the same windows account.
If your concurrency requires more Windows users, you can change this parameter by using SQL Server Configuration Manager:
Here we can change Windows user count:
Another parameter from pythonlauncher.xml file, <WORKING_DIRECTORY> is a directory on the disk created for each account to store intermediate Python scripts execution data. To provide isolation to processes under the same Sql Server connection, each external script will use separate subfolder inside the same WorkingDirectory. This subfolder will be deleted after script has finished its execution. Moreover, after the script execution, Python process itself will be terminated and a new process will be launched to keep connection pool ready for the future executions.
Please see below WORKING_DIRECTORY, MSSQL\ExtensibilityData content here. Subfolders are GUIDS which are an id of the process that is using the subfolder for the intermediate query results.
Pythonlauncher is using the connection pool to avoid the Windows process spin up penalty of ~ 100 ms during the user query. By default, there are 5 processes in the connection pool. I still haven’t found how to change this magic number of the connection pool size. Let me know if you find where configuration of this parameter is located.
After Python scripts start running, we see in the Task Manager 5 Python processes. Using the wmic command, we can see whole command line that the Pythonlauncher used to spin up the process. You can see that the sessionid is being used for the process WORKING_DIRECTORY that we have mentioned earlier.
Notice, all 5 processes are using the same Windows account MSSQLSERVER01 and work inside MSSQLSERVER01 Working Directory.
Here we see all 5 Bxl processes which are running from the package RevoScalePy:
All installed packages, including RecoScalePy package, after installation using pip command, are located in the regular Python site-packages directory.
In today’s tip we have discussed the way Python scripts execution support have been implemented in SQL Server. I think the person who added Python support had no idea how many tools and features he is adding. You are now able to connect to online resources, like for instance Forex Online Trading and get REALTIME conversion rates. You can do almost everything by using Python rest API library (Rest API is a universal language that allows CRUD operation via HTTP protocol). You can access and query any No-SQL data store and return the resultset to SQL Server client transparently. Today we have discussed execution of such Python scripts can be configured and troubleshooted.
- Here is more information about Python from Microsoft.
- Check out all of the Python tips.
- Check out the SQL Server 2017 and Python Tutorial.
Last Updated: 2018-08-30
About the author
View all my tips