By: Vitor Montalvao | Comments | Related: > SQL Server 2016
Problem
The sys.sp_execute_external_script is a new system stored procedure introduced in SQL Server 2016. It intends to execute any provided script at an external location. Microsoft says that the script must be written in a supported and registered language, but actually the only supported language is R. How can we use this new system stored procedure? Can you provide some examples?
Solution
To be able to use the sp_execute_external_script stored procedure, you need to enable it first by running the following statement:
EXEC sp_configure 'external scripts enabled', 1 GO RECONFIGURE WITH OVERRIDE GO
For the changes take effect, restart the SQL Server service for the respective SQL Server instance.
Users that needs to execute this stored procedure, are required to have EXECUTE ANY EXTERNAL SCRIPT database permission.
Limitations
The sp_execute_external_script stored procedure has limitations and you need to know them to be able to work with this stored procedure:
- Only supports R language scripts
- Results to be handled by SQL Server need to be returned as data frame
- Input data should be provided as SELECT command. For example, an execution of a stored procedure that returns data cannot be used.
For more limitations please check the Restrictions section for the sys.sp_execute_external_script MSDN article.
Working with sp_execute_external_script
The sp_execute_external_script stored procedure has some parameters and some particularities on how to work with so I will try to explain with examples, starting from the more basic and simple example and introducing more parameters and options to the examples to cover more complex scenarios.
Language and Script Parameters
The sp_execute_external_script stored procedure has only two parameters that are always required: (@language and @script). The rest of the parameters are optional.
@language is a nvarchar(128) variable and it is used to store the supported script language. At the time of writing, the only valid value for this parameter is 'R'.
@script is a nvarchar(max) variable and it is used to store the external language code so it is here that you will type your external code.
Below is the classic example of "Hello world" to show the simplest usage of the sp_execute_external_script stored procedure. It only returns a text message to the SQL Server by using the R cat function:
Input Parameters
To work with SQL Server data you need two parameters (@input_data_1 and @input_data_1_name).
@input_data_1 is a nvarchar(max) variable and it is used to store the SELECT command.
@input_data_1_name is a nvarchar(128) variable and it is used to give a name for the above SELECT command. This name is how it will be recognized by the external script stored in the @script parameter.
Follow is a simple example of how to use these parameters in the sp_execute_external_script stored procedure. Check how the provided name in the @input_data_1_name parameter is being used in the @script parameter (yellow highlight) and how to reference the query field (green highlight):
Output Parameter
The sp_execute_external_script stored procedure returns data in a data frame. The default data frame name is OutputDataSet. If you want to change the data frame name you need to provide the desired name with the @output_data_1_name parameter (a nvarchar(128) variable).
The following is an example with the default data frame name (highlighted) where there is no need to provide the @output_data_1_name parameter:
And below is the same query with a non default data frame name. You can see that the name should match the one provided in the @output_data_1_name parameter (highlighted) otherwise it will return an error:
Additional Parameters
The sp_execute_external_script stored procedure allows you to pass parameters to the external script and T-SQL scripts provided respectively by @script and @input_data_1 parameters.
The following example includes two parameters in the @params parameter. The example is very easy to understand. We want to apply a provided discount (15% in this case) to all customers that spent more than 10000. The T-SQL script will return those customers and the R script will apply the discount.
The first parameter (yellow highlight) will be used in the T-SQL script (@input_data_1 parameter). This provides a name for the parameter and the data type. Next define the value in a separated parameter.
The second parameter (green highlight) will be used in the R script (@script parameter). This provides a name for the parameter and the data type. Next define the value in a separated parameter. Check how it lost the '@' when used inside the R script.
NOTE: The order of the parameters are not relevant.
As previously mentioned, the external script returns only a data frame, but it is possible to have scalar values returned by using OUTPUT parameters. For that you just add the parameters as explained above and add the OUTPUT keyword in the parameter definition.
Evaluating the previous example, we want to know the total of the discounted amount. For that we create an OUTPUT parameter where we will store the sum of all discounts highlighted below. The respective variable needs to be created in the T-SQL code and then passed to our external script where it will have a value stored and returned so we will be able to use it later.
NOTE: Again, the order of the parameters are not relevant.
Results Set Clause
Besides the parameters, the sp_execute_external_script stored procedure has a results clause (WITH RESULT SETS). This clause can accept 3 values:
- UNDEFINED
- ((<result_sets_definition>))
- NONE
UNDEFINED means that the results returned in the data frame will not have defined column names. This is the default behavior as can be seen by all examples presented until now and have it compared with the next one:
To define column names and data types, the ((<result_sets_definition>)) is needed. The number of columns and respective data types must match the ones returned by the data frame, otherwise it will raise an error. In the below example we just specify the names and data types for the respective returned columns:
NONE is used when we do not want a data frame to be returned. Let's imagine that we only care about the global total discounted amount and not the discount by customer. As shown in the example below, we do not have a data frame returned, but only the scalar value that we want to use to present the result.
NOTE: In the example below I commented the @output_data_1_name on purpose to make it obvious that it cannot be used together WITH RESULTS SET NONE clause. It will return an error if the parameter is available. And for the same reason the default OutputDataSet data frame cannot be used because it will try to return the data frame and an error will be thrown.
Next Steps
- Test out this code and parameters for yourself in a development environment.
- Think about potential use cases for using the R Language to improve your productivity.
- Learn more about SQL Server 2016 and the R Language.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips