# How to Create a R Function in SQL Server

By: Matteo Lorini | Updated: 2020-11-04 | Comments | Related: More > R Language

##### Problem

I have a basic knowledge of R and wanted to know if it's possible to create custom R functions in SQL Server. If so, how this is done? Can you provide code samples?

##### Solution

In this article we will explore how to create custom R functions in SQL Server. One main item to keep in mind is that R considers function as objects therefore we can work with them the same way we work with other R objects and functions.

## R Functions in SQL Server

A typical function in R has a name, input parameters and a body.

Let's see an example of a function that converts temperature between Fahrenheit and Celsius according to an input parameter value of C or F.

fcTemperatureCCOnvert <- function(ConvertTo,mytemperature) { if(ConvertTo == "F") Newtemperature <- (mytemperature - 32) * 5 / 9 else Newtemperature <- mytemperature * 9 / 5 + 32 return(Newtemperature) } print (" Convert Fahrenheit to Celsius ") print (fcTemperatureCCOnvert("F",68) ) print (" Convert Celsius To Fahrenheit ") print (fcTemperatureCCOnvert("C",20)

We can explore the body of the function using the body() command.

body(fcTemperatureCCOnvert)

Using SQL Server Management Studio:

DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' fcTemperatureCCOnvert <- function(ConvertTo,mytemperature) { if(ConvertTo == "F") Newtemperature <- (mytemperature - 32) * 5 / 9 else Newtemperature <- mytemperature * 9 / 5 + 32 return(Newtemperature) } print (" Convert Fahrenheit to Celsius ") print (fcTemperatureCCOnvert("F",68) ) print (" Convert Celsius To Fahrenheit ") print (fcTemperatureCCOnvert("C",20) ) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

The R language allow functions to have input parameters. When calling an R function we can pass the input parameters by full name, by position or by partial name. Let's see an example.

myADRS <- function(FamilyName, Adress, zip, state) { FullAddress <- c(FamilyName , Adress, zip, state) return(FullAddress) } print('Function call using arguments by position') print (myADRS('Test Person','51 Test dr Atlanta','30031','GA' )) print('Function call using arguments by Full name and Position') print (myADRS(FamilyName ='Test Person',zip='30031', '51 Test dr Atlanta','GA' )) print('Function call using arguments by Partial Name') print (myADRS(sta='GA', Fam ='Test Person',z='30031', '51 Test dr Atlanta' ))

Using SQL Server Management Studio:

DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' myADRS <- function(FamilyName, Adress, zip, state) { FullAddress <- c(FamilyName , Adress, zip, state) return(FullAddress) } print(''Function call using arguments by position'') print (myADRS(''Test Person'',''51 Test dr Atlanta'',''30031'',''GA'' )) print(''Function call using arguments by Full name and Position'') print (myADRS(FamilyName =''Test Person'',zip=''30031'', ''51 Test dr Atlanta'',''GA'' )) print(''Function call using arguments by Partial Name'') print (myADRS(sta=''GA'', Fam =''Test Person'',z=''30031'', ''51 Test dr Atlanta'' )) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

## Example R Functions in SQL Server - Count, Mean, Median, Standard Deviation and Summarize

In the next example, I want to show how we can use a combination of R functions (Count, Mean, Median, Standard Deviation and Summarize) to compare 2012 and 2013 yearly monthly sales for sale agent 275.

Our input data comes from the AdvebtureWorks2014 database. I think that this example is a good one because it shows how to pass the output of T-SQL query to an R script and it also shows how R consumes the input data to return a final calculation.

The input query is simple, we basically query the SaleOrderHeader table of AdventureWoks2014 database to summarize the 2012 and 2013 sales of sales person 275.

select year(OrderDate) as SaleYear,month(OrderDate) as SaleMonth, sum(TotalDue) as SaleAmount from Sales.SalesOrderHeader where SalesPersonID=275 and year(OrderDate) in(2012,2013) group by year(OrderDate),month(OrderDate) order by year(OrderDate),month(OrderDate)

In the first part of our R script we take the input data produced by our query and we split them in two datasets, one that contains 2012 sales and the other with 2013 sales.

# Imput Query data SalesAnalysis <- InputDataSet # Split Sales by Year SalesSplit <- split(SalesAnalysis, SalesAnalysis$SaleYear) SalesBefore <- SalesSplit$"2012"$SaleAmount SalesAfter <- SalesSplit$"2013"$SaleAmount

In the second part of the script we group our data and display the record count, mean, median and standard deviation.

# Group the data dataAnalysys <- data.frame( group = rep(c("SalesBefore", "SalesAfter"), each = 12), weight = c(SalesBefore, SalesAfter) ) # Print our statistical results print( group_by(dataAnalysys, group) %>% summarise( count = n(), mean = mean(weight, na.rm = TRUE), median = median(weight, na.rm = TRUE), sd = sd(weight, na.rm = TRUE) ))

Our full script is the following:

DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' library("dplyr") # Imput Query data SalesAnalysis <- InputDataSet # Split Sales by Year SalesSplit <- split(SalesAnalysis, SalesAnalysis$SaleYear) SalesBefore <- SalesSplit$"2012"$SaleAmount SalesAfter <- SalesSplit$"2013"$SaleAmount # Group the data dataAnalysys <- data.frame( group = rep(c("SalesBefore", "SalesAfter"), each = 12), weight = c(SalesBefore, SalesAfter) ) # Print our statistical results print( group_by(dataAnalysys, group) %>% summarise( count = n(), mean = mean(weight, na.rm = TRUE), median = median(weight, na.rm = TRUE), sd = sd(weight, na.rm = TRUE) )) '; DECLARE @sqlscript NVARCHAR(MAX); SET @sqlscript = N' select year(OrderDate) as SaleYear,month(OrderDate) as SaleMonth, sum(TotalDue) as SaleAmount from Sales.SalesOrderHeader where SalesPersonID=275 and year(OrderDate) in (2012,2013) group by year(OrderDate),month(OrderDate) order by year(OrderDate),month(OrderDate)'; EXEC sp_execute_external_script @language = N'R', @script = @rscript, @input_data_1 = @sqlscript; GO

**Mean:** Mean is the sum of all your numbers and then divide by the number
of elements.

- We can conclude that sales person 275 had a mean or average sale of $374,245.2 in 2013 and $317,191.5 in 2012, so the sales average has increased.

**Median:** Median is the middle value in the list of numbers. In our example our
sales person median 2013 sales of $363,827.30 has increased over 2012 median sales
of $288,849.50.

**Standard Deviation:** Standard deviation indicates how spread out the data is.
It is a measure of how far each observed value is from the mean. Our example indicates
that 2012 sales are more spread out from the mean than 2013 sales.

## Conclusion

In this tip we have learned how to create and use simple functions in R. In a future tip we will see how to create R stored procedures.

##### Next Steps

- The reader will need to install RStudio in order to test this tip.
- R Tutorial
- Check out these tips

##### About the author

**View all my tips**

Article Last Updated: 2020-11-04