Problem
I have a basic knowledge of R and wanted to know if it’s possible to create custom R functions in SQL Server. If so, how this is done? Can you provide code samples?
Solution
In this article we will explore how to create custom R functions in SQL Server. One main item to keep in mind is that R considers function as objects therefore we can work with them the same way we work with other R objects and functions.
R Functions in SQL Server
A typical function in R has a name, input parameters and a body.
Let’s see an example of a function that converts temperature between Fahrenheit and Celsius according to an input parameter value of C or F.
fcTemperatureCCOnvert <- function(ConvertTo,mytemperature) {
if(ConvertTo == "F")
Newtemperature <- (mytemperature - 32) * 5 / 9
else
Newtemperature <- mytemperature * 9 / 5 + 32
return(Newtemperature)
}
print (" Convert Fahrenheit to Celsius ")
print (fcTemperatureCCOnvert("F",68) )
print (" Convert Celsius To Fahrenheit ")
print (fcTemperatureCCOnvert("C",20)

We can explore the body of the function using the body() command.
body(fcTemperatureCCOnvert)

Using SQL Server Management Studio:
DECLARE @rscript NVARCHAR(MAX SET @rscript = N' fcTemperatureCCOnvert <- function(ConvertTo,mytemperature) { if(ConvertTo == "F") Newtemperature <- (mytemperature - 32) * 5 / 9 else Newtemperature <- mytemperature * 9 / 5 + 32 return(Newtemperature) } print (" Convert Fahrenheit to Celsius ") print (fcTemperatureCCOnvert("F",68) ) print (" Convert Celsius To Fahrenheit ") print (fcTemperatureCCOnvert("C",20) ) ' EXEC sp_execute_external_script @language = N'R', @script = @rscript GO

The R language allow functions to have input parameters. When calling an R function we can pass the input parameters by full name, by position or by partial name. Let’s see an example.
myADRS <- function(FamilyName, Adress, zip, state) {
FullAddress <- c(FamilyName , Adress, zip, state)
return(FullAddress)
}
print('Function call using arguments by position')
print (myADRS('Test Person','51 Test dr Atlanta','30031','GA' ))
print('Function call using arguments by Full name and Position')
print (myADRS(FamilyName ='Test Person',zip='30031', '51 Test dr Atlanta','GA' ))
print('Function call using arguments by Partial Name')
print (myADRS(sta='GA', Fam ='Test Person',z='30031', '51 Test dr Atlanta' ))

Using SQL Server Management Studio:
DECLARE @rscript NVARCHAR(MAX SET @rscript = N' myADRS <- function(FamilyName, Adress, zip, state) { FullAddress <- c(FamilyName , Adress, zip, state) return(FullAddress) } print(''Function call using arguments by position'') print (myADRS(''Test Person'',''51 Test dr Atlanta'',''30031'',''GA'' )) print(''Function call using arguments by Full name and Position'') print (myADRS(FamilyName =''Test Person'',zip=''30031'', ''51 Test dr Atlanta'',''GA'' )) print(''Function call using arguments by Partial Name'') print (myADRS(sta=''GA'', Fam =''Test Person'',z=''30031'', ''51 Test dr Atlanta'' )) ' EXEC sp_execute_external_script @language = N'R', @script = @rscript GO

Example R Functions in SQL Server – Count, Mean, Median, Standard Deviation and Summarize
In the next example, I want to show how we can use a combination of R functions (Count, Mean, Median, Standard Deviation and Summarize) to compare 2012 and 2013 yearly monthly sales for sale agent 275.
Our input data comes from the AdvebtureWorks2014 database. I think that this example is a good one because it shows how to pass the output of T-SQL query to an R script and it also shows how R consumes the input data to return a final calculation.
The input query is simple, we basically query the SaleOrderHeader table of AdventureWoks2014 database to summarize the 2012 and 2013 sales of sales person 275.
select year(OrderDate) as SaleYear,month(OrderDate) as SaleMonth, sum(TotalDue) as SaleAmount from Sales.SalesOrderHeader where SalesPersonID=275 and year(OrderDate) in(2012,2013) group by year(OrderDate),month(OrderDate) order by year(OrderDate),month(OrderDate)

In the first part of our R script we take the input data produced by our query and we split them in two datasets, one that contains 2012 sales and the other with 2013 sales.
# Imput Query data SalesAnalysis <- InputDataSet # Split Sales by Year SalesSplit <- split(SalesAnalysis, SalesAnalysis$SaleYear) SalesBefore <- SalesSplit$"2012"$SaleAmount SalesAfter <- SalesSplit$"2013"$SaleAmount
In the second part of the script we group our data and display the record count, mean, median and standard deviation.
# Group the data
dataAnalysys <- data.frame(
group = rep(c("SalesBefore", "SalesAfter"), each = 12),
weight = c(SalesBefore, SalesAfter)
)
# Print our statistical results
print(
group_by(dataAnalysys, group) %>%
summarise(
count = n(),
mean = mean(weight, na.rm = TRUE),
median = median(weight, na.rm = TRUE),
sd = sd(weight, na.rm = TRUE)
))
Our full script is the following:
DECLARE @rscript NVARCHAR(MAX SET @rscript = N' library("dplyr") # Imput Query data SalesAnalysis <- InputDataSet # Split Sales by Year SalesSplit <- split(SalesAnalysis, SalesAnalysis$SaleYear) SalesBefore <- SalesSplit$"2012"$SaleAmount SalesAfter <- SalesSplit$"2013"$SaleAmount # Group the data dataAnalysys <- data.frame( group = rep(c("SalesBefore", "SalesAfter"), each = 12), weight = c(SalesBefore, SalesAfter) ) # Print our statistical results print( group_by(dataAnalysys, group) %>% summarise( count = n(), mean = mean(weight, na.rm = TRUE), median = median(weight, na.rm = TRUE), sd = sd(weight, na.rm = TRUE) )) ' DECLARE @sqlscript NVARCHAR(MAX SET @sqlscript = N' select year(OrderDate) as SaleYear,month(OrderDate) as SaleMonth, sum(TotalDue) as SaleAmount from Sales.SalesOrderHeader where SalesPersonID=275 and year(OrderDate) in (2012,2013) group by year(OrderDate),month(OrderDate) order by year(OrderDate),month(OrderDate)' EXEC sp_execute_external_script @language = N'R', @script = @rscript, @input_data_1 = @sqlscript GO

Mean: Mean is the sum of all your numbers and then divide by the number of elements.
- We can conclude that sales person 275 had a mean or average sale of $374,245.2 in 2013 and $317,191.5 in 2012, so the sales average has increased.
Median: Median is the middle value in the list of numbers. In our example our sales person median 2013 sales of $363,827.30 has increased over 2012 median sales of $288,849.50.
Standard Deviation: Standard deviation indicates how spread out the data is. It is a measure of how far each observed value is from the mean. Our example indicates that 2012 sales are more spread out from the mean than 2013 sales.
Conclusion
In this tip we have learned how to create and use simple functions in R. In a future tip we will see how to create R stored procedures.
Next Steps
- The reader will need to install RStudio in order to test this tip.
- R Tutorial
- Check out these tips

Matteo is currently a DBA for one of the largest global electronic payment service providers in the U.S. He has been working in IT since he graduated from college in 1993. Matteo has a Master’s degree in Computer Science, and specializes in MSSQL with a good working knowledge of MySQL as well.
- MSSQLTips Awards: Rising Star (50+ tips) – 2021 | Author Contender – 2020