By: Koen Verbeeck | Updated: 2016-02-01 | Comments (14) | Related: More > SQL Server 2016
SQL Server 2016 comes with the integration of the popular language R into the database engine and in this tip we look at executing R code directly in SQL Server.
SQL Server 2016 preview
At the time of writing, SQL Server 2016 is still in preview (at the time of writing CTP 3.1 has been released). This means that the features of R Services can still change and that functionality might change, disappear or be added.
If you are interested in writing and executing R scripts in a client environment - the data science workstation - please refer to the tip SQL Server 2016 R Services - Executing R code using a Client Application. On the workstation, you have the option to let the R code be executed locally or inside the SQL Server database. In this tip, we'll talk about executing R scripts directly in the database engine using stored procedures. For this you need to install the Advanced Analytics Extensions feature and the necessary R distribution and packages. The tip SQL Server 2016 R Services: Guide for Server Configuration describes the installation and configuration process.
Introduction to sp_execute_external_script
SQL Server 2016 includes a new system stored procedure to enable you to execute external scripts inside the database engine. The stored procedure sp_execute_external_script loads and executes a script written in a supported language from an external location. The query tree is controlled by SQL Server. In order to execute external scripts, you need to enable this on the server with the following statement:
sp_configure 'external scripts enabled', 1;
The stored procedure accepts the following parameters:
- @language: the supported language in which the script is written. For the moment, only "R" is supported.
- @script: the external script that needs to be executed. This can be a literal or a variable and its data type is nvarchar(max).
- @input_data_1: specifies the input data for the R script. This is a T-SQL statement returning a result set. The data type is nvarchar(max). If you don't need input data, the parameter must be declared as an empty string.
- @input_data_1_name: the name of the variable in the R script that will contain the result set of the input query. In the R script, the variable must be of type data frame. This parameter is optional and the default is inputDataSet.
- @output_data_1_name: specifies the name of the variable in the R script that contains the output data. In the R script, the variable must be of type data frame. This parameter is optional and the default is outputDataSet.
The documentation also mentions the ability to use parameters to be used by the external script - something like @param1= 'inputValue' - but currently the syntax section doesn't show you can use this.
There are a couple of SQL Server data types which you cannot use in the input query to R. These are: cursor, timestamp, datetime2, datetimeoffset, time, sqlvariant, text, image, xml, hierarchyid, geometry, geography, CLR types and all Unicode types. All of these data types should be casted to a supported data type in the SQL statement. Also, not all datetime values are supported. The allowed range in SQL Server is larger than the one in R. If you have datetime values outside of the R range, these will be converted to NA.
On the other hand, some float values of R are not supported in SQL Server: +Inf, -Inf and NaN. These will give errors in SSMS, so you should convert those to NULL first.
If the R script returns a result set, you must also specify the WITH RESULTS SET clause.
Let's put this new stored procedure to work. First, let's calculate the quantiles of the various ages of the AdventureWorks customers. This can easily be done with the R function quantile. As input to the R script, we have a simple T-SQL statement:
SELECT Ages = DATEDIFF(YEAR,[BirthDate],GETDATE()) FROM [AdventureWorksDW2014].[dbo].[DimCustomer];
The stored procedure looks like this:
-- calculate simple quantiles EXEC sp_execute_external_script @language = N'R' ,@script = N' res <-quantile(InputDataSet$Ages); print(res);' ,@input_data_1 = N'SELECT Ages = DATEDIFF(YEAR,[BirthDate],GETDATE()) FROM [AdventureWorksDW2014].[dbo].[DimCustomer];' ;
Remember, if the name of the input data is not specified, InputDataSet is used. The construct InputDataSet$Ages allows us to access the Ages column of the InputDataSet data frame. The following result is returned to SQL Server:
You could also send the results back to SQL Server. The result should be a data frame or an error is returned.
The function data.frame can be used to convert the result to a data frame. The script becomes:
-- calculate simple quantiles EXEC sp_execute_external_script @language = N'R' ,@script = N' res <-quantile(InputDataSet$Ages); df <- data.frame(res);' ,@input_data_1 = N'SELECT Ages = DATEDIFF(YEAR,[BirthDate],GETDATE()) FROM [AdventureWorksDW2014].[dbo].[DimCustomer];' ,output_data_1_name = N'df' ;
However, a WITH RESULT SETS clause should be added, or another error is thrown:
The final script:
-- calculate simple quantiles EXEC sp_execute_external_script @language = N'R' ,@script = N' res <-quantile(InputDataSet$Ages); df <- data.frame(res);' ,@input_data_1 = N'SELECT Ages = DATEDIFF(YEAR,[BirthDate],GETDATE()) FROM [AdventureWorksDW2014].[dbo].[DimCustomer];' ,output_data_1_name = N'df' WITH RESULT SETS (("res" int not null));
You can load the data into a table by using the INSERT ... EXEC statement. However, you cannot specify a WITH RESULT SETS clause with an INSERT ... EXEC statement, but SQL Server doesn't seem bothered when you leave it out:
In the WITH RESULT SETS clause, you can also specify more complex structures, such as models and plots, which are usually returned as varbinary(max). In the documentation you can find an example that creates a model. The plot result is described in an upcoming tip where we'll integrate R plots into Reporting Services reports.
With the sp_execute_external script stored procedure, you can execute R scripts inside the SQL Server database engine. You can use T-SQL queries to fetch input data and you can either display the results or return an actual result set. There are a couple of limitations, but nothing too serious.
- More tips on SQL Server R services
- For more info on R Services:
- For more SQL Server 2016, read these other SQL Server 2016 Tips.
Last Updated: 2016-02-01
About the author
View all my tips