By: Matteo Lorini | Comments | Related: More > R Language
Problem
What type of data structures are available in R and how to do you use them in RStudio and in Microsoft SQL Server?
Solution
In this article, we will examine the main R data structures and provide examples of how to use them in both RStudio and SQL. The primary types of R data structures are Atomic Vector, Matrix, Array, List, and Data Frame.
Vectors
R language provides two types of Vectors that are Atomic Vector and List. The main characteristic of Atomic Vectors is that all elements must be of the same kind, while a List can have aspects of different types.
Atomic Vector
The primary types of Atomic vectors are logical, integer, double, and character. Let us see how to define and use them.
Vector are create using the R command c() that stands for combine.
The below sample code shows how to create Atomic Vectors. It is interesting to see that once a Vector is created we can directly access a single element of it. For example, the second element of Vector chr_vet is the string "MSSQLTips". We can access it directly by typing chr_vct[2] or print(chr_vct[2]) in RStudio and print(chr_vct[2]) if we are using SSMS.
To test if the Vector is Atomic we can use is.atomic(); typeof() to identify the type of Vector, length() to find the number of elements, and attributes() show additional arbitrary metadata.
## Example of Atomic Vectors # Integer int_vct <- c(1L, 6L, 10L) # Double dbl_vct <- c(1, 2.5, 4.5) # Logical log_vct <- c(TRUE, FALSE, T, F) # Character chr_vct <- c("Hallo", "MSSQLTips", ".com") # List all Vector elements chr_vct # List only 2nd element chr_vct[2] # Test if Vector is atomic is.atomic(chr_vct) # Test Vector type typeof(chr_vct) # Show how many element is a vector length(chr_vct) # Display vector attributes attributes(chr_vct)
Please notice that when we copy the code from RStudio to SSMS we have to introduce a print() function to have our results displayed in the SSMS Message windows.
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' ## Example of Atomic Vectors # Integer int_vct <- c(1L, 6L, 10L) # Double dbl_vct <- c(1, 2.5, 4.5) # Logical log_vct <- c(TRUE, FALSE, T, F) # Character chr_vct <- c("Hallo", "MSSQLTips", ".com") # List all Vector elements print(chr_vct) # List only 2nd element print(chr_vct[2]) # Test if Vector is atomic print(is.atomic(chr_vct)) # Test Vector type print(typeof(chr_vct)) # Show how many element is a vector print(length(chr_vct)) # Display vector attributes print(attributes(chr_vct)) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
List
Lists are different from atomic vectors because their elements can be of any type, including lists. You create a list by using the list() command.
The below example is used to create a list with different elements types. We use the R str() command to see the structure of any R objects, list included.
myList <- list(1:5, "MSSQLTips", c(TRUE, FALSE, TRUE), c(3.3, 9.9, 12.2)) str(myList)
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' myList <- list(1:5, "MSSQLTips", c(TRUE, FALSE, TRUE), c(3.3, 9.9, 12.2)) print(str(myList)) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
The output of the str() command shows that our List is comprised of an Atomic Integer Vector, an Atomic (1 Element) Char Vector, an Atomic Logical Vector, and an Atomic numeric vector.
We can use command is.list() to check if the R object is a list and we can used typeof(), length(), and attributes() commands as well.
Is.list(myList) typeof(myList) length(myList) attributes(myList)
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' myList <- list(1:5, "MSSQLTips", c(TRUE, FALSE, TRUE), c(3.3, 9.9, 12.2)) print(is.list(myList)) print(typeof(myList)) print(length(myList)) print(attributes(myList)) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
To access to elements of a list we can use the following syntax:
- myList[[2]] - returns the second element of the list "MSSQLTips"
- myList[[3]] - returns the 3rd Atomic Logical Vector TRUE FALSE TRUE
- myList[[3]][2] - returns the 2nd element of the 3rd Atomic Logical Vector value FALSE
myList[[2]] myList[[3]] myList[[3]][2]
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' myList <- list(1:5, "MSSQLTips", c(TRUE, FALSE, TRUE), c(3.3, 9.9, 12.2)) print(myList[[2]]) print(myList[[3]]) print(myList[[3]][2]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
Attributes
All objects can have additional attributes, used to store metadata about the object. Attributes can be thought of as a named list (with unique names). Attributes can be accessed individually with attr() or all at once (as a list) with attributes().
Let us see an example of how to assign an attribute value to a List
# Assign Attribute attr(myList,"My Attribute") <- "My First Attribute" # Display Attribute attr(myList,"My Attribute") attributes(myList) str(attributes(myList))
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' myList <- list(1:5, "MSSQLTips", c(TRUE, FALSE, TRUE), c(3.3, 9.9, 12.2)) # Assign Attribute attr(myList,"My Attribute") <- "My First Attribute" # Display Attribute attr(myList,"My Attribute") print(attributes(myList)) print(str(attributes(myList))) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
Matrix and Array
A Matrix is a two-dimensional Atomic array and they are used commonly as part of the mathematical machinery of statistics. R Matrix is created using matrix() command while array using array() command.
In the following example, we will be creating a matrix and execute a few basic operations on it.
# Create a new Matrix myMatrix <- matrix(1:6, ncol = 3, nrow = 2) # Display Matrix Values myMatrix # Display Matrix Structure str(myMatrix) # Display Matrix length and number of columns and rows length(myMatrix) nrow(myMatrix) ncol(myMatrix) # Assign a value to a Matrix Cell myMatrix[2,2] <- 99 # Display Matrix Values myMatrix
It is also possible define a name for each column and row in a Matrix using rownames() and colnames() functions.
rownames(myMatrix) <- c("X", "Y") colnames(myMatrix) <- c("A", "B", "C") myMatrix
It is possible to execute mathematical operation fairly easy on Matrix, for example let's execute a sum of two Matrices.
# Define a New Matrix myMatrix1 <- matrix(1:6, ncol = 3, nrow = 2) myMatrix1 # Matrix Sum m <- myMatrix1 + myMatrix m
We can execute the same type of code using SSMS.
An array is a vector with one or more dimensions. So, an array with one dimension is (almost) the same as a vector. An array with two dimensions is (almost) the same as a matrix. An array with three or more dimensions is an n-dimensional array.
Let us see it with an example.
#Example array code: myarr = array(0.0, 3) # [0.0 0.0 0.0] Vector print(arr)
# Add a dimension and we get a Matrix myarr = array(0.0, c(2,3)) # 2x3 matrix print(arr)
#Add another Dimension myarr = array(0.0, c(2,5,4)) # 2x5x4 n-array print(myarr) # 40 values displayed
Data Frames
A Data Frame is the most common way of storing and working with data in R. Data Frames are nothing more than a list of equal-length vectors, making them a 2-dimensional structure. Data Frames share the properties of both the matrix and list.
Data Frame has names(), colnames(), and rownames(), although names() and colnames() are the same thing. The length() of a data frame is the length of the underlying list, and so is the same as ncol(); nrow() gives the number of rows.
Let us see how to create and work with a Data Frame. Our first example creates a Data Frame of 5 objects and two variables x and y.
# Create a Data Frame df <- data.frame(x = 1:5, y = c("a", "b", "c", "d","e")) str(df)
# Test the following function on data Frames names(df) colnames(df) rownames(df) length(df) ncol(df) nrow(df)
Let us add a new column and an new row of data to the existing Data Frame using cbind() and rbing() R function.
# Add a new column cbind(df, data.frame(z = 9:5))
# Add a new row rbind(df, data.frame(x=9, y="Z"))
Data Frames are so important and flexible that we can have a list of vectors as a Data Frame column.
# Create a Data Frame df <- data.frame(x = c("A","B","C")) # Display Data Frame print(df) # Display only the the Data Frame print(df$x) # Assign a list of Vectors df$y <- list(1:2, 3:5, 6:9) print(df) print(df$y)
We can also see how the above code works in SSMS.
Conclusion
In the tip we have learned the main R data types and how to create and use them. In the tip we will talk about Subsetting of data type and how to add, remove and ordering for a Data Frame.
Next Steps
- The reader will need to install RStudio in order to test this tip.
- Check out these tips
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips