SQL Server 2016 R Services: Executing R code in SQL Server

By:   |   Comments (14)   |   Related: > SQL Server 2016


Problem

SQL Server 2016 comes with the integration of the popular language R into the database engine and in this tip we look at executing R code directly in SQL Server.

Solution

SQL Server 2016 preview

At the time of writing, SQL Server 2016 is still in preview (at the time of writing CTP 3.1 has been released). This means that the features of R Services can still change and that functionality might change, disappear or be added.

Prerequisites

If you are interested in writing and executing R scripts in a client environment - the data science workstation - please refer to the tip SQL Server 2016 R Services - Executing R code using a Client Application. On the workstation, you have the option to let the R code be executed locally or inside the SQL Server database. In this tip, we'll talk about executing R scripts directly in the database engine using stored procedures. For this you need to install the Advanced Analytics Extensions feature and the necessary R distribution and packages. The tip SQL Server 2016 R Services: Guide for Server Configuration describes the installation and configuration process.

Introduction to sp_execute_external_script

SQL Server 2016 includes a new system stored procedure to enable you to execute external scripts inside the database engine. The stored procedure sp_execute_external_script loads and executes a script written in a supported language from an external location. The query tree is controlled by SQL Server. In order to execute external scripts, you need to enable this on the server with the following statement:

sp_configure 'external scripts enabled', 1;

The stored procedure accepts the following parameters:

  • @language: the supported language in which the script is written. For the moment, only "R" is supported.
  • @script: the external script that needs to be executed. This can be a literal or a variable and its data type is nvarchar(max).
  • @input_data_1: specifies the input data for the R script. This is a T-SQL statement returning a result set. The data type is nvarchar(max). If you don't need input data, the parameter must be declared as an empty string.
  • @input_data_1_name: the name of the variable in the R script that will contain the result set of the input query. In the R script, the variable must be of type data frame. This parameter is optional and the default is inputDataSet.
  • @output_data_1_name: specifies the name of the variable in the R script that contains the output data. In the R script, the variable must be of type data frame. This parameter is optional and the default is outputDataSet.

The documentation also mentions the ability to use parameters to be used by the external script - something like @param1= 'inputValue' - but currently the syntax section doesn't show you can use this.

There are a couple of SQL Server data types which you cannot use in the input query to R. These are: cursor, timestamp, datetime2, datetimeoffset, time, sqlvariant, text, image, xml, hierarchyid, geometry, geography, CLR types and all Unicode types. All of these data types should be casted to a supported data type in the SQL statement. Also, not all datetime values are supported. The allowed range in SQL Server is larger than the one in R. If you have datetime values outside of the R range, these will be converted to NA.

On the other hand, some float values of R are not supported in SQL Server: +Inf, -Inf and NaN. These will give errors in SSMS, so you should convert those to NULL first.

If the R script returns a result set, you must also specify the WITH RESULTS SET clause.

Examples

Let's put this new stored procedure to work. First, let's calculate the quantiles of the various ages of the AdventureWorks customers. This can easily be done with the R function quantile. As input to the R script, we have a simple T-SQL statement:

SELECT Ages = DATEDIFF(YEAR,[BirthDate],GETDATE())
FROM [AdventureWorksDW2014].[dbo].[DimCustomer];

The stored procedure looks like this:

-- calculate simple quantiles
EXEC sp_execute_external_script
      @language = N'R'
     ,@script = N' res <-quantile(InputDataSet$Ages);
                   print(res);'
     ,@input_data_1 = N'SELECT Ages = DATEDIFF(YEAR,[BirthDate],GETDATE())
                        FROM [AdventureWorksDW2014].[dbo].[DimCustomer];'
;

Remember, if the name of the input data is not specified, InputDataSet is used. The construct InputDataSet$Ages allows us to access the Ages column of the InputDataSet data frame. The following result is returned to SQL Server:

Example1

You could also send the results back to SQL Server. The result should be a data frame or an error is returned.

Example2

The function data.frame can be used to convert the result to a data frame. The script becomes:

-- calculate simple quantiles
EXEC sp_execute_external_script
      @language = N'R'
     ,@script = N' res <-quantile(InputDataSet$Ages);
                   df <- data.frame(res);'
     ,@input_data_1 = N'SELECT Ages = DATEDIFF(YEAR,[BirthDate],GETDATE())
                        FROM [AdventureWorksDW2014].[dbo].[DimCustomer];'
     ,output_data_1_name = N'df'
;

However, a WITH RESULT SETS clause should be added, or another error is thrown:

Example3

The final script:

-- calculate simple quantiles
EXEC sp_execute_external_script
      @language = N'R'
     ,@script = N' res <-quantile(InputDataSet$Ages);
                   df <- data.frame(res);'
     ,@input_data_1 = N'SELECT Ages = DATEDIFF(YEAR,[BirthDate],GETDATE())
                        FROM [AdventureWorksDW2014].[dbo].[DimCustomer];'
     ,output_data_1_name = N'df'
WITH RESULT SETS (("res" int not null));

Example4

You can load the data into a table by using the INSERT ... EXEC statement. However, you cannot specify a WITH RESULT SETS clause with an INSERT ... EXEC statement, but SQL Server doesn't seem bothered when you leave it out:

Example5

In the WITH RESULT SETS clause, you can also specify more complex structures, such as models and plots, which are usually returned as varbinary(max). In the documentation you can find an example that creates a model. The plot result is described in an upcoming tip where we'll integrate R plots into Reporting Services reports.

Conclusion

With the sp_execute_external script stored procedure, you can execute R scripts inside the SQL Server database engine. You can use T-SQL queries to fetch input data and you can either display the results or return an actual result set. There are a couple of limitations, but nothing too serious.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a seasoned business intelligence consultant at AE. He has over a decade of experience with the Microsoft Data Platform in numerous industries. He holds several certifications and is a prolific writer contributing content about SSIS, ADF, SSAS, SSRS, MDS, Power BI, Snowflake and Azure services. He has spoken at PASS, SQLBits, dataMinds Connect and delivers webinars on MSSQLTips.com. Koen has been awarded the Microsoft MVP data platform award for many years.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Friday, November 30, 2018 - 12:25:34 PM - Juan Back To Top (78376)

 Hi Koen,

NVARCHAR did the trick, however, in my case, it was necessary also to cast the date column in the dataframe as factor.

Thanks a lot!!!


Friday, November 30, 2018 - 9:05:33 AM - Koen Verbeeck Back To Top (78371)

As a follow-up, I asked someone on Twitter (who has more experience with R and SQL Server) about your issues, and this was his answer: "Nvarchar in result sets from external procedure should do the job. Also check the column matching from outputdataset and result sets. These must be aligned as well."

https://twitter.com/tomaz_tsql/status/1068481800945758208


Friday, November 30, 2018 - 2:07:32 AM - Koen Verbeeck Back To Top (78367)

Hi Juan (and Nick),

this error is thrown because for some reason, SQL Server thinks the result from R is a float, not a date. Without having actual code to look at, it's hard to say what causes this.

Regards,
Koen


Wednesday, November 28, 2018 - 10:58:52 PM - Juan Back To Top (78354)

Hi,

I'm dealing with the same issue reported by Nick...How can I fix the situation with dates between R and SQLServer?

EXECUTE statement failed because its WITH RESULT SETS clause specified type 'date' for column #2 in result set #1, and the corresponding type sent at run time was 'float'; there is no conversion between the two types.

Thanks 

 


Monday, April 24, 2017 - 5:43:42 PM - Nick Back To Top (55152)

 Hi Koen,

 

My output from R contains a date column in the '2015-01-01' format.

 

I am trying to output this to my results set, but SQL Server giving error message.  Have you encountered this before?  Cannot find anything online...

 

Thanks.

 

Nick

 

  WITH RESULT SETS (

  ([ForecastedTotalAmount] int NOT NULL

,[BusinessDate] date NOT NULL)

)

END

 

EXECUTE statement failed because its WITH RESULT SETS clause specified type 'date' for column #2 in result set #1, and the corresponding type sent at run time was 'float'; there is no conversion between the two types.

 

 


Wednesday, April 5, 2017 - 6:34:51 AM - Yang Back To Top (54277)

Hi Koen,

 

Thank you very much. It turned out to be some user credential issues for me. Now it's solved. Both odbcConnect funtion in RODBC pacage and RxSqlServerData function in RevoScaleR package worked well for me. 

 


Tuesday, March 28, 2017 - 4:28:22 AM - Koen Verbeeck Back To Top (53795)

If the R code is run on another machine, you'll need to create the ODBC connection there as well.
There are other ways to read SQL data with R, there are plenty of examples on the web.


Friday, March 24, 2017 - 1:43:53 AM - Yang Back To Top (51578)

 Hi Koen,

 

I am using remote desktop with Window OS. I think it may be another machine running the SQL Server.

Instead of setting up ODBC connection and using rxodbcData, is there any other way to call tables in the SQL Server when I run R codes in SQL server?

 


Thursday, March 23, 2017 - 9:31:23 AM - Koen Verbeeck Back To Top (51564)

Hi Yang,

you probably want to escape that backslash.

Are you running R Studio on the same machine as SQL Server?


Wednesday, March 22, 2017 - 7:59:15 PM - Yang Back To Top (51548)

Hi Koen,

 

Thank you for your reply. I tried RevoScaleR package. It worked if I use R studio. When I copied the R codes into SQL server, it did not work. The following is my Code.

EXEC sp_execute_external_script

 @language = N'R'  

,@script = N' 

library(RevoScaleR)

VarImptSQL = "SELECT * FROM 

VarImportanceLastmonth"

VarImptDS2<- RxOdbcData(sqlQuery = VarImptSQL, 

                        

connectionString = "DSN=DataMining2;Uid=mlhd.bisupport;Pwd=passwork;")

VarImptFile <- RxXdfData("VarImptFromODBC.xdf")# Create an xdf file name

rxImport(VarImptDS2, VarImptFile, overwrite=TRUE)# Import the data into the xdf file

var_importance <- rxDataStep(VarImptFile)# Read xdf file into a data frame

'

,@output_data_1_name = N'var_importance'  

And here is the error in the Message tab:

An external script error occurred:

[Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified

Could not open data source.

Error in doTryCatch(return(expr), name, parentenv, handler) :   

Could not open data source.

Calls: source ... tryCatch -> tryCatchList -> tryCatchOne -> doTryCatch -> .Call 

 

One thing maybe I should mentaion is that in the SQl server, my uid is GSAHS\mlhd.bisupport. But when I try to include "GSAHS\" in the R code, there is an error. And it works well in R studio when I just use the part after "\".

 


Wednesday, March 22, 2017 - 5:40:38 AM - Koen Verbeeck Back To Top (51526)

Hi Yang,

you could read out an extra dataset into a data frame using standard SQL capabilities in R.
You can find an example here (using the RevoScaleR packages):

 

https://msdn.microsoft.com/en-us/library/mt637370.aspx


Tuesday, March 21, 2017 - 12:59:10 AM - Yang Back To Top (51485)

 

Hi Koen,

Thank you for your article. It helped me a lot to star use SQL server R services. I have two SQL tables to be used as input data set in the R code. I can use @input_data_1 to pass one of the input data set. How can I pass the other table?

I read from one MSDN help page saying that"Only one input dataset can be passed as a parameter, and you can return only one dataset. However, you can call other datasets from inside your R code "(https://msdn.microsoft.com/en-us/library/mt793558.aspx). But I did not find a way to do it yet. 

Any suggestion? Than you very much in advance.


Thursday, March 10, 2016 - 6:42:16 AM - Koen Verbeeck Back To Top (40904)

Hi StuBioge,

the launchpad service is responsible for the integration of T-SQL and the R language.
Make sure that this service is running and that you did all of the post-deployment configuration steps.


Friday, February 26, 2016 - 12:58:55 PM - StuBioge Back To Top (40807)

When trying to run your first code example that calculates simple quantiles, I'm receiving the following error: "Msg 39011, Level 16, State 1, Line 1 SQL Server was unable to communicate with the LaunchPad service. Please verify the configuration of the service." Could you provide any insight into what would be causing this problem with LaunchPad?

 















get free sql tips
agree to terms