How to Create a R Function in SQL Server


By:   |   Updated: 2020-11-04   |   Comments   |   Related: More > R Language


Problem

I have a basic knowledge of R and wanted to know if it's possible to create custom R functions in SQL Server. If so, how this is done? Can you provide code samples?

Solution

In this article we will explore how to create custom R functions in SQL Server. One main item to keep in mind is that R considers function as objects therefore we can work with them the same way we work with other R objects and functions.

R Functions in SQL Server

A typical function in R has a name, input parameters and a body.

Let's see an example of a function that converts temperature between Fahrenheit and Celsius according to an input parameter value of C or F.

fcTemperatureCCOnvert <- function(ConvertTo,mytemperature) {
  if(ConvertTo == "F")
    Newtemperature <- (mytemperature - 32) * 5 / 9
  else
    Newtemperature <- mytemperature * 9 / 5 + 32
  return(Newtemperature)
}
print (" Convert Fahrenheit to Celsius ")
print (fcTemperatureCCOnvert("F",68) )
print (" Convert Celsius To Fahrenheit ")
print (fcTemperatureCCOnvert("C",20) 
r function example

We can explore the body of the function using the body() command.

body(fcTemperatureCCOnvert)
r function example

Using SQL Server Management Studio:

DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
     fcTemperatureCCOnvert <- function(ConvertTo,mytemperature) {
    if(ConvertTo == "F")
      Newtemperature <- (mytemperature - 32) * 5 / 9
    else
      Newtemperature <- mytemperature * 9 / 5 + 32
    return(Newtemperature)
   }
   print (" Convert Fahrenheit to Celsius ")
   print (fcTemperatureCCOnvert("F",68) )
   print (" Convert Celsius To Fahrenheit ")
   print (fcTemperatureCCOnvert("C",20) )
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
r function example

The R language allow functions to have input parameters. When calling an R function we can pass the input parameters by full name, by position or by partial name. Let's see an example.

myADRS <- function(FamilyName, Adress, zip, state) {
     FullAddress <- c(FamilyName , Adress, zip, state)
     return(FullAddress)
}
print('Function call using arguments by position')
print (myADRS('Test Person','51 Test dr Atlanta','30031','GA' ))
print('Function call using arguments by Full name and Position')
print (myADRS(FamilyName ='Test Person',zip='30031', '51 Test dr Atlanta','GA' ))
print('Function call using arguments by Partial Name')
print (myADRS(sta='GA', Fam ='Test Person',z='30031', '51 Test dr Atlanta' ))
r function example

Using SQL Server Management Studio:

DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
   myADRS <- function(FamilyName, Adress, zip, state) {
    FullAddress <- c(FamilyName , Adress, zip, state)
    return(FullAddress)
}
print(''Function call using arguments by position'')
print (myADRS(''Test Person'',''51 Test dr Atlanta'',''30031'',''GA'' ))
print(''Function call using arguments by Full name and Position'')
print (myADRS(FamilyName =''Test Person'',zip=''30031'', ''51 Test dr Atlanta'',''GA'' ))
print(''Function call using arguments by Partial Name'')
print (myADRS(sta=''GA'', Fam =''Test Person'',z=''30031'', ''51 Test dr Atlanta'' ))
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
r function example

Example R Functions in SQL Server - Count, Mean, Median, Standard Deviation and Summarize

In the next example, I want to show how we can use a combination of R functions (Count, Mean, Median, Standard Deviation and Summarize) to compare 2012 and 2013 yearly monthly sales for sale agent 275.

Our input data comes from the AdvebtureWorks2014 database. I think that this example is a good one because it shows how to pass the output of T-SQL query to an R script and it also shows how R consumes the input data to return a final calculation.

The input query is simple, we basically query the SaleOrderHeader table of AdventureWoks2014 database to summarize the 2012 and 2013 sales of sales person 275.

select year(OrderDate) as SaleYear,month(OrderDate) as SaleMonth, sum(TotalDue) as SaleAmount from Sales.SalesOrderHeader 
where SalesPersonID=275 and year(OrderDate) in(2012,2013)
group by year(OrderDate),month(OrderDate)
order by year(OrderDate),month(OrderDate)
sql query results

In the first part of our R script we take the input data produced by our query and we split them in two datasets, one that contains 2012 sales and the other with 2013 sales.

# Imput Query data
SalesAnalysis <- InputDataSet
# Split Sales by Year
SalesSplit <- split(SalesAnalysis, SalesAnalysis$SaleYear)
SalesBefore <- SalesSplit$"2012"$SaleAmount
SalesAfter <- SalesSplit$"2013"$SaleAmount

In the second part of the script we group our data and display the record count, mean, median and standard deviation.

# Group the data
   dataAnalysys <- data.frame( 
                group = rep(c("SalesBefore", "SalesAfter"), each = 12),
                weight = c(SalesBefore,  SalesAfter)
                )
    # Print our statistical results
   print(
   group_by(dataAnalysys, group) %>%
     summarise(
      count = n(),
      mean = mean(weight, na.rm = TRUE),
median = median(weight, na.rm = TRUE), 
      sd = sd(weight, na.rm = TRUE)
     ))

Our full script is the following:

  DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
   library("dplyr")
   # Imput Query data
   SalesAnalysis <- InputDataSet
   # Split Sales by Year
   SalesSplit <- split(SalesAnalysis, SalesAnalysis$SaleYear)
   SalesBefore <- SalesSplit$"2012"$SaleAmount
   SalesAfter <- SalesSplit$"2013"$SaleAmount
   
   # Group the data
   dataAnalysys <- data.frame(
               group = rep(c("SalesBefore", "SalesAfter"), each = 12),
               weight = c(SalesBefore, SalesAfter)
               )
   # Print our statistical results
   print(
   group_by(dataAnalysys, group) %>%
    summarise(
      count = n(),
      mean = mean(weight, na.rm = TRUE),
      median = median(weight, na.rm = TRUE), 
      sd = sd(weight, na.rm = TRUE)
    ))
   ';
  DECLARE @sqlscript NVARCHAR(MAX);
  SET @sqlscript = N'
      select year(OrderDate) as SaleYear,month(OrderDate) as SaleMonth, sum(TotalDue) as SaleAmount from Sales.SalesOrderHeader
      where SalesPersonID=275 and year(OrderDate) in (2012,2013)
      group by year(OrderDate),month(OrderDate)
      order by year(OrderDate),month(OrderDate)';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript,
    @input_data_1 = @sqlscript; 
  GO
r function example

Mean: Mean is the sum of all your numbers and then divide by the number of elements.

  • We can conclude that sales person 275 had a mean or average sale of $374,245.2 in 2013 and $317,191.5 in 2012, so the sales average has increased.

Median: Median is the middle value in the list of numbers. In our example our sales person median 2013 sales of $363,827.30 has increased over 2012 median sales of $288,849.50.

Standard Deviation: Standard deviation indicates how spread out the data is. It is a measure of how far each observed value is from the mean. Our example indicates that 2012 sales are more spread out from the mean than 2013 sales.

Conclusion

In this tip we have learned how to create and use simple functions in R. In a future tip we will see how to create R stored procedures.

Next Steps


Last Updated: 2020-11-04


get scripts

next tip button



About the author
MSSQLTips author Matteo Lorini Matteo Lorini is a DBA and has been working in IT since 1993. He specializes in SQL Server and also has knowledge of MySQL.

View all my tips



Comments For This Article





download





Recommended Reading

Import Data from PDF to SQL Server

Using Simple Linear Regression to Make Predictions

Data Structures in R including Vector, Matrix, Array, List, and Data Frame

Stored Procedure in SQL Server with R Code

Create SQL Server Graphics with ggplot2 and R Services














get free sql tips
agree to terms