Basic R Concepts
By: Siddharth Mehta
R is a statistical programming language that provides different categories of functionality in libraries (also called packages). For applying statistical analysis, one often needs sample data. R ships with many real-life built-in sample datasets that can be used for analyzing the statistical computations and algorithms. To develop these computations, one needs to know regular programming constructs like variables, data types, operators, loops, etc.
Most of the programming constructs that are available in R are also available in T-SQL. Our intention is not to learn R in full, but to learn R constructs that enable us to consume the unique R libraries and data processing / computation mechanisms that are not available in T-SQL. In this lesson, we will be learning the basic concepts of R, just sufficient enough for us to apply R functions and packages against a SQL Server data repository.
R version, packages and datasets
We already learned in the last lesson how we can check the version of R server with which the database engine is communicating. It’s necessary to know the version of R you are working with, as that can be considered the basis of what is supported by a particular version of R server. Using sp_execute_external_script, with a simple R property “R.version”, we can check the details of the R version as shown below. The print function prints the output on the SSMS message console. If this command is executed on the R console, it would print the same output as shown below. In this lesson, our focus is developing fundamentals of R. We will discuss the details of sp_execute_external_script in the next lesson. Until then, consider this procedure as execution wrapper.
The next step is to explore the different default libraries available in Microsoft R Open server. You can explore them from here. You can load any given library by using the library function. We will look at an example of the use of this function very shortly.
After exploring the list of packages available in R, the next step is to explore the list of datasets that can you use. You can explore as well as download a list of datasets classified by packages from here.
Variables, Comments and Printing Data
In R, a variable is created by using the assignment operator “<-“. The data type of the variable is determined by the data stored in R. The code can be commented in R using the # character. Let’s understand these concepts with an example.
--Example: Variables execute sp_execute_external_script @language = N'R', @script = N' var1 <- "Siddharth" Var1 <- "Sid" var2 <- 100 var3 <- 50.5 var4 <- TRUE print(var1) print(Var1) print(var2 + var3) print(var4) print(class(var1)) print(class(var2)) print(class(var4)) '
Executing the above code, the output should look as shown below. Below are the points you can derive from the above example:
- Variables can be created using the “<-“(assignment) operator.
- Variables are case-sensitive. Var1 and var1 are considered different variables.
- The data-type of the variable is determined by the type of data stored in the variable.
- You can inquire about the value of variables using the print function
- The class function can be used on variables to determine the data type of the variable which is classified in three major types – character, numeric and logical.
- There are other data structure types too, but we will be limiting our discussing to these three basic types.
Arithmetic, Operators, Loops
The below table shows a list of arithmetic and logical operators in R. It’s not an exhaustive list, but covers major operators that you may use when you start learning R.
|<=||Less than or equal to|
|>=||Greater than or equal to|
|==||Exactly equal to|
|!=||Not equal to|
Though these operators should be easy to understand, below is a basic example of how you may use these operators.
Here we have used these operators on actual values. You can use these operators in the same way on variables too.
There is a high possibility that we may have to loop through the data for applying some statistical computations. So, we need to learn at least one looping technique in R. Below is a simple example of a while loop. In this example, we are assigning the value of 0 to variable i. We are printing the value of “i” in the loop and incrementing the value of i. We are also placing a condition that if value of i reaches 3, then break out of the loop using the “break” statement.
We have not learned anything new in this lesson that we cannot achieve with T-SQL. But once the data is submitted from SQL Server data tables / views to Microsoft R server for processing, we will need to use programming constructs in R language. In this lesson, we learned this at a very basic level. Now that we have some basic idea of how to apply some arithmetic on the data in R, let us learn how to accept and receive data between T-SQL and R in the next lesson.
- Consider implementing all the operators and practice using them with variables in R.