Introduction to SQL Server Machine Learning Services with Python

Problem

In 2017, SQL Server introduced support for Python language which opened the door for creating machine learning models using SQL Server. The SQL Server machine learning services along with Python support can be used to create a model that is capable of prediction.

In this tip, we will examine a dataset that contains information about variance, skewness, curtosis, and entropy of 1372 bank currency notes. The bank currency notes have been divided into two categories: fake or real. The task is to create a machine learning model using SQL Server which is trained on the banknote dataset and is able to predict if an unknown bank currency note is real or fake.

Solution

In this article, we will see how to develop a machine learning model that is capable of detecting a fake currency note. To do so, we need to follow the steps below.

Installing SQL Server With Machine Learning Services

Before you can execute Python scripts and perform machine learning tasks on SQL Server, you need to download a SQL Server instance with machine learning services enabled. The article explains the process of installing SQL Server with machine learning services enabled.

Enabling SQL Server Machine Learning Services

To enable the machine learning services, go to SQL Server Management Studio. If you have not already installed SQL Server Management Studio, you can download it from this link.

In the SQL Server Management Studio, open a new query window and type the following script:

EXEC sp_configure 'external scripts enabled', 1 
RECONFIGURE WITH OVERRIDE

The script above enables execution of any external scripts in SQL Server. If the above script executes successfully, you should see the following message.

Configuration option 'external scripts enabled' changed from 0 to 1. Run the RECONFIGURE statement to install.

Before the Python scripts can be executed, we need to restart the SQL Server. To do so, open the SQL Server Configuration Manager from the windows start menu. From the options on the left, select "SQL Server Services". You will see list of all the SQL Server Instances, running on your system as shown below. Right click the SQL Server instance that you installed along with machine learning services and click "Restart".

Restart SQL Server in the SQL Server Configuration Manager

Testing Python Scripts

Execute the following script:

EXEC <span style="color: #800000">sp_execute_external_script</span>
  @language <span style="color: #808080">= </span><span style="color: #ff0000">N'Python'</span><span style="color: #808080">,</span>
  @script<span style="color: #808080"> = </span><span style="color: #ff0000">N'print("Welcome to Python in SQL Server")'</span>
GO

You should see the following in the output

STDOUT message(s) from external script: 
Welcome to Python in SQL Server

Now you have everything that you need to execute machine learning services in SQL Server 2017.

Getting the Machine Learning Dataset

The dataset for the current problem can be downloaded from the UCI Machine Learning Repository. The dataset is originally in text format and you can download a CSV version from this GitHub Link. This dataset contains data that were extracted from images taken from genuine and forged banknote-like specimens. The dataset includes the following columns: variance, skewness, curtosis, entropy and class.

Next, create database and a table inside your SQL Server 2017 instance by executing the following command.

CREATE DATABASE BANK
GO
USE BANK
GO
CREATE TABLE [dbo].[banknote](
   [variance] [real] NULL,
   [skewness] [real] NULL,
   [curtosis] [real] NULL,
   [entropy] [real] NULL,
   [class] [real] NULL
) ON [PRIMARY]
GO

Next, import the CSV file that you downloaded into the table named "banknote" in the BANK database.

Once the CSV file has been loaded into the "banknote" table, let’s execute a simple SELECT statement to see how the dataset looks:

SELECT * FROM banknote

Screenshot of the output looks like the image below. Our task is to predict the class of the currency note, depending upon the variance, skewness, curtosis, and entropy of the banknote.

predict the class of the currency note, depending upon the variance, skewness, curtosis, and entropy of the banknote

Dividing Data into Training and Test Set for Machine Learning

A machine learning algorithm is trained on one portion of data, called a training set and is evaluated on another portion of data called the test set. We will train and test our algorithm on 90% training data and 10% test data. To divide the data into training and test sets, execute the following script:

SELECT TOP 10 PERCENT * 
 INTO banknote_test
 FROM banknote
SELECT * 
 INTO banknote_train
 FROM banknote
 EXCEPT
 SELECT TOP 10 PERCENT * FROM banknote

In the script above, we split our "banknote" table into two tables: "banknote_test" which contains 10% of the data and "banknote_train" which contains 90% of the data.

Creating a Table to Store Models

Next, we need to define the table that will contain our trained model. Execute the following script:

DROP TABLE IF EXISTS banknote_models;
GO
CREATE TABLE banknote_models (
  model_name VARCHAR(50) NOT NULL DEFAULT('default model') PRIMARY KEY,
  model VARBINARY(MAX) NOT NULL
);
GO

The script above will create the "banknote_models" table in the "BANK" database. The script will delete any existing table with the same name. Remember, our model will be stored in the model attribute and the type of model will be varbinary(MAX).

Create Stored Procedure for Training the Model

The next step is to create a stored procedure that will train the model on training data. Execute the following script:

CREATE PROCEDURE create_banknote_model (@trained_model varbinary(max) OUTPUT)
AS
BEGIN
EXEC sp_execute_external_script @language = N'Python',
@script = N'
import pickle 
from sklearn.ensemble import RandomForestClassifier 
clf = RandomForestClassifier() 
trained_model = pickle.dumps(clf.fit(banknote_train[[0,1,2,3]], banknote_train[[4]].values.ravel())) 
' 
, @input_data_1 = N'select "variance", "skewness", "curtosis", "entropy", "class" from banknote_train'
, @input_data_1_name = N'banknote_train'
, @params = N'@trained_model varbinary(max) OUTPUT'
, @trained_model = @trained_model OUTPUT;
END;
GO

Take a careful look at the script above. In the script above we create "create_bank_note" stored procedure which accepts one parameter of type varbinary (max). Inside the script, we execute Python script. We use RandomForestClassifier from the sklearn.ensemble library to train our model. To train the model we need to call the fit method of the RandomForestClassifier and pass it both the attributes and the output class. Since in our dataset column 0-3 contains features, we pass these columns in place of feature and column 4 contains the output class, we pass it as the class to predict. Finally, we specify the columns and the dataset that we are going to use to train our model.

Once, you create the above stored procedure, you need to execute the stored procedure using the following command:

DECLARE @model varbinary(max);
DECLARE @new_model_name varchar(50)
SET @new_model_name = 'Random Forest'
EXEC create_banknote_model @model OUTPUT;
DELETE banknote_models WHERE model_name = @new_model_name;
INSERT INTO banknote_models (model_name, model) values(@new_model_name, @model);
GO

Creating Stored Procedure for Making Predictions

To make predictions on the test set i.e. "Banknote_test" table, we will again make use of the stored procedure. Execute the following script:

CREATE PROCEDURE predict_banknote_class (@model varchar(100))
AS
BEGIN
DECLARE @rf_model varbinary(max) = (SELECT model FROM banknote_models WHERE model_name = @model);
EXEC sp_execute_external_script @language = N'Python', 
@script = N'
import pickle
banknote_model = pickle.loads(rf_model)
note_pred = banknote_model.predict(banknote_test[[1,2,3,4]])
note_actual = banknote_test[''class''].tolist()
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(note_actual, note_pred)
print(accuracy)
'
, @input_data_1 = N'select "variance", "skewness", "curtosis", "entropy", "class" from banknote_test'
, @input_data_1_name = N'banknote_test'
, @params = N'@rf_model varbinary(max)'
, @rf_model = @rf_model
END;
GO

In the script above, we create "predict_banknote_class" stored procedure which predicts the class of the test data in the "banknote_test" table. In the above stored procedure, we first select the trained model that we created using the "create_banknote_mode" stored procedure. This stored procedure will be used for making predictiosn on the test set. Next, we simply define the data that we want our model to make predictions on. Finally, the stored procedure prints the class percentage of correct predictions. The accuracy_score class of the sklearn.metric module is used to calculate the accuracy of the predictions.

Making Predictions with the Machine Learning Stored Procedure

To make actual predictions, we only need to execute the "predict_banknote_class" stored procedure that we created in the last section. We need to pass the name of the algorithm that we want to use for making predictions which is "Random Forest" in our case. Execute the following script:

EXEC predict_banknote_class 'Random Forest';
GO

The output looks like this:

Output from the predictions stored procedure

The predicted accuracy is 87.68% which is impressive given the fact that we only had around 1372 records in the dataset.

References

Next Steps

Reduce the training size to 80% and increase the test size to 20% to see if you obtain different results
Use algorithms other than Random Forest, such as support vector machine, logistic regression, and Naive Bayes to see if you obtain better results.
Find a regression problem such as House Price prediction and see if SQL Server can help solve regression problems.
Check out these resources:

Ben Richardson

Ben is the owner of Acuity Training an IT training business based in the UK. Acuity offers a wide variety of IT training courses including a full range of SQL training. These range from introductory querying courses through to advanced administration and BI courses. He is a relatively recent convert to MSSQL Server and been working with it for 6 years. He was a previously a venture capitalist investing in software businesses for a number of years and before that a banker and accountant. He blogs occasionally on a variety of IT related topics at Acuity’s blog.

4 Comments

Mahmoud
May 2, 2024 / 12:24 pm Reply
Before I comment, I wanna thanks you to Sharing this code that help me in ML Startup and deep undersand for the process .
The Problem: When installing scipy = 1.13 and numpy = 1.26.4, I got error Key and version incompatible.
The Solution: I replace the code below :
trained_model = pickle.dumps(clf.fit(banknote_train[[0,1,2,3]], banknote_train[[4]].values.ravel()))
With:
trained_model = pickle.dumps(clf.fit(banknote_train.iloc[:, :4], banknote_train.iloc[:, 4].values.ravel()))
Bobby
May 28, 2022 / 8:34 pm Reply
Getting this error below, any help is much appreciated
An external script error occurred:
Error in execution. Check the output for more information.
Traceback (most recent call last):
File “<string>”, line 5, in <module>
File “C:\ProgramData\MSSQLSERVER\Temp-PY\Appcontainer1\2672D764-385B-4DE3-8CDC-9838B8B6B124\sqlindb_0.py”, line 37, in transform
banknote_model = pickle.loads(rf_model)
TypeError: a bytes-like object is required, not ‘NoneType’
SqlSatelliteCall error: Error in execution. Check the output for more information.
nadin
April 28, 2021 / 1:04 pm Reply
Please replace this line in the create_banknote_model procedure:
trained_model = pickle.dumps(clf.fit(banknote_train[[0,1,2,3]], banknote_train[[4]].values.ravel()))
with:
trained_model = pickle.dumps(clf.fit(banknote_train[[“variance”, “skew”, “curtosis”, “entropy”]], banknote_train[[“class”]].values.ravel()))
Marco Chou
January 2, 2021 / 12:18 pm Reply
When I ran Execute the stored procedure:
DECLARE @model varbinary(max);
DECLARE @new_model_name varchar(50)
SET @new_model_name = ‘Random Forest’
EXEC create_banknote_model @model OUTPUT;
DELETE banknote_models WHERE model_name = @new_model_name;
INSERT INTO banknote_models (model_name, model) values(@new_model_name, @model);
an error came up …
Msg 39004, Level 16, State 20, Line 1
A ‘Python’ script error occurred during execution of ‘sp_execute_external_script’ with HRESULT 0x80004004.
Msg 39019, Level 16, State 2, Line 1
An external script error occurred:
Error in execution. Check the output for more information.
Traceback (most recent call last):
File “<string>”, line 5, in <module>
File “C:\ProgramData\MSSQLSERVER\Temp-PY\Appcontainer1\2D38D111-F0F3-4AD1-A68C-909EF1638345\sqlindb_0.py”, line 41, in transform
trained_model = pickle.dumps(clf.fit(banknote_train[[0,1,2,3]], banknote_train[[4]].values.ravel()))
File “C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\core\frame.py”, line 2682, in __getitem__
return self._getitem_array(key)
File “C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\core\frame.py”, line 2726, in _getitem_array
indexer = self.loc._convert_to_indexer(key, axis=1)
File “C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\core\indexing.py”, line 1327, in _convert_to_indexer
.format(mask=objarr[mask]))
Msg 39019, Level 16, State 2, Line 1
An external script error occurred:
KeyError: ‘[0 1 2 3] not in index’
SqlSatelliteCall error: Error in execution. Check the output for more information.
STDOUT message(s) from external script:
SqlSatelliteCall function failed. Please see the console output for more information.
Traceback (most recent call last):
File “C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\revoscalepy\computecontext\RxInSqlServer.py”, line 605, in rx_sql_satellite_call
rx_native_call(“SqlSatelliteCall”, params)
File “C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\revoscalepy\RxSerializable.py”, line 375, in rx_native_call
ret = px_call(functionname, params)
RuntimeError: revoscalepy function failed.
(0 rows affected)
Msg 515, Level 16, State 2, Line 6
Cannot insert the value NULL into column ‘model’, table ‘BANK.dbo.banknote_models’; column does not allow nulls. INSERT fails.
The statement has been terminated.
Any suggestions?