Data Structures in R including Vector, Matrix, Array, List, and Data Frame


By:   |   Updated: 2020-07-23   |   Comments   |   Related: More > R Language

Problem

What type of data structures are available in R and how to do you use them in RStudio and in Microsoft SQL Server?

Solution

In this article, we will examine the main R data structures and provide examples of how to use them in both RStudio and SQL. The primary types of R data structures are Atomic Vector, Matrix, Array, List, and Data Frame.

Vectors

R language provides two types of Vectors that are Atomic Vector and List. The main characteristic of Atomic Vectors is that all elements must be of the same kind, while a List can have aspects of different types.

Atomic Vector

The primary types of Atomic vectors are logical, integer, double, and character. Let us see how to define and use them.

Vector are create using the R command c() that stands for combine.

The below sample code shows how to create Atomic Vectors. It is interesting to see that once a Vector is created we can directly access a single element of it. For example, the second element of Vector chr_vet is the string "MSSQLTips". We can access it directly by typing chr_vct[2] or print(chr_vct[2]) in RStudio and print(chr_vct[2]) if we are using SSMS.

To test if the Vector is Atomic we can use is.atomic(); typeof() to identify the type of Vector, length() to find the number of elements, and attributes() show additional arbitrary metadata.

## Example of Atomic Vectors

# Integer
int_vct <- c(1L, 6L, 10L)

# Double
dbl_vct <- c(1, 2.5, 4.5)

# Logical
log_vct <- c(TRUE, FALSE, T, F)

# Character
chr_vct <- c("Hallo", "MSSQLTips", ".com")

# List all Vector elements
chr_vct

# List only 2nd element 
chr_vct[2]

# Test if Vector is atomic
is.atomic(chr_vct)

# Test Vector type
typeof(chr_vct) 

# Show how many element is a vector 
length(chr_vct)

# Display vector attributes
attributes(chr_vct)
r command output

Please notice that when we copy the code from RStudio to SSMS we have to introduce a print() function to have our results displayed in the SSMS Message windows.

DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
   ## Example of Atomic Vectors
   # Integer
   int_vct <- c(1L, 6L, 10L)
   # Double
   dbl_vct <- c(1, 2.5, 4.5)
   # Logical
   log_vct <- c(TRUE, FALSE, T, F)
   # Character
   chr_vct <- c("Hallo", "MSSQLTips", ".com")
   # List all Vector elements
   print(chr_vct)
   # List only 2nd element 
   print(chr_vct[2])
   # Test if Vector is atomic
   print(is.atomic(chr_vct))
   # Test Vector type
   print(typeof(chr_vct))
   # Show how many element is a vector 
   print(length(chr_vct))
   # Display vector attributes
   print(attributes(chr_vct))
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
r command output

List

Lists are different from atomic vectors because their elements can be of any type, including lists. You create a list by using the list() command.

The below example is used to create a list with different elements types. We use the R str() command to see the structure of any R objects, list included.

myList <- list(1:5, "MSSQLTips", c(TRUE, FALSE, TRUE), c(3.3, 9.9, 12.2))
str(myList)
r command output
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
   myList <- list(1:5, "MSSQLTips", c(TRUE, FALSE, TRUE), c(3.3, 9.9, 12.2))
   print(str(myList))
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
r command output

The output of the str() command shows that our List is comprised of an Atomic Integer Vector, an Atomic (1 Element) Char Vector, an Atomic Logical Vector, and an Atomic numeric vector.

We can use command is.list() to check if the R object is a list and we can used typeof(), length(), and attributes() commands as well.

Is.list(myList)
typeof(myList)
length(myList)
attributes(myList)
r command output
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
   myList <- list(1:5, "MSSQLTips", c(TRUE, FALSE, TRUE), c(3.3, 9.9, 12.2))
   print(is.list(myList))
   print(typeof(myList))
   print(length(myList))
   print(attributes(myList))
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
r command output

To access to elements of a list we can use the following syntax:

  • myList[[2]] - returns the second element of the list "MSSQLTips"
  • myList[[3]] - returns the 3rd Atomic Logical Vector TRUE FALSE TRUE
  • myList[[3]][2] - returns the 2nd element of the 3rd Atomic Logical Vector value FALSE
myList[[2]]   
myList[[3]]   
myList[[3]][2]
r command output
DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
myList <- list(1:5, "MSSQLTips", c(TRUE, FALSE, TRUE), c(3.3, 9.9, 12.2))
print(myList[[2]]) 
print(myList[[3]]) 
print(myList[[3]][2]) 
';
EXEC sp_execute_external_script 
@language = N'R', 
@script = @rscript; 
GO
r command output

Attributes

All objects can have additional attributes, used to store metadata about the object. Attributes can be thought of as a named list (with unique names). Attributes can be accessed individually with attr() or all at once (as a list) with attributes().

Let us see an example of how to assign an attribute value to a List

# Assign Attribute
attr(myList,"My Attribute") <- "My First Attribute"
# Display Attribute
attr(myList,"My Attribute")
attributes(myList)
str(attributes(myList))
r command output
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    myList <- list(1:5, "MSSQLTips", c(TRUE, FALSE, TRUE), c(3.3, 9.9, 12.2))
    # Assign Attribute
   attr(myList,"My Attribute") <- "My First Attribute"
   # Display Attribute
   attr(myList,"My Attribute")
   print(attributes(myList))
   print(str(attributes(myList)))
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
r command output

Matrix and Array

A Matrix is a two-dimensional Atomic array and they are used commonly as part of the mathematical machinery of statistics. R Matrix is created using matrix() command while array using array() command.

In the following example, we will be creating a matrix and execute a few basic operations on it.

# Create a new Matrix
myMatrix <- matrix(1:6, ncol = 3, nrow = 2)
# Display Matrix Values
myMatrix
# Display Matrix Structure
str(myMatrix)
# Display Matrix length and number of columns and rows
length(myMatrix) 
nrow(myMatrix)  
ncol(myMatrix)
# Assign a value to a Matrix Cell
myMatrix[2,2] <- 99
# Display Matrix Values
myMatrix
r command output

It is also possible define a name for each column and row in a Matrix using rownames() and colnames() functions.

rownames(myMatrix) <- c("X", "Y")
colnames(myMatrix) <- c("A", "B", "C")
myMatrix
r command output

It is possible to execute mathematical operation fairly easy on Matrix, for example let's execute a sum of two Matrices.

# Define a New Matrix
myMatrix1 <- matrix(1:6, ncol = 3, nrow = 2)
myMatrix1
# Matrix Sum
m <- myMatrix1 + myMatrix
m
r command output

We can execute the same type of code using SSMS.

r command output

An array is a vector with one or more dimensions. So, an array with one dimension is (almost) the same as a vector. An array with two dimensions is (almost) the same as a matrix. An array with three or more dimensions is an n-dimensional array.

Let us see it with an example.

#Example array code:
myarr = array(0.0, 3)  # [0.0 0.0 0.0] Vector
print(arr)
r command output
# Add a dimension and we get a Matrix
myarr = array(0.0, c(2,3))  # 2x3 matrix
print(arr)
r command output
#Add another Dimension
myarr = array(0.0, c(2,5,4)) # 2x5x4 n-array
print(myarr)  # 40 values displayed
r command output
r command output

Data Frames

A Data Frame is the most common way of storing and working with data in R. Data Frames are nothing more than a list of equal-length vectors, making them a 2-dimensional structure. Data Frames share the properties of both the matrix and list.

Data Frame has names(), colnames(), and rownames(), although names() and colnames() are the same thing. The length() of a data frame is the length of the underlying list, and so is the same as ncol(); nrow() gives the number of rows.

Let us see how to create and work with a Data Frame. Our first example creates a Data Frame of 5 objects and two variables x and y.

# Create a Data Frame
df <- data.frame(x = 1:5, y = c("a", "b", "c", "d","e"))
str(df)
r command output
r command output
# Test the following function on data Frames
names(df)
colnames(df)
rownames(df)
length(df) 
ncol(df)
nrow(df)
r command output

Let us add a new column and an new row of data to the existing Data Frame using cbind() and rbing() R function.

# Add a new column
cbind(df, data.frame(z = 9:5))
r command output
# Add a new row
rbind(df, data.frame(x=9, y="Z"))
r command output

Data Frames are so important and flexible that we can have a list of vectors as a Data Frame column.

# Create a Data Frame
df <- data.frame(x = c("A","B","C"))
# Display Data Frame
print(df)
# Display only the the Data Frame
print(df$x)
# Assign a list of Vectors
df$y <- list(1:2, 3:5, 6:9)
print(df)
print(df$y)
r command output

We can also see how the above code works in SSMS.

r command output
r command output

Conclusion

In the tip we have learned the main R data types and how to create and use them. In the tip we will talk about Subsetting of data type and how to add, remove and ordering for a Data Frame.

Next Steps


Last Updated: 2020-07-23


get scripts

next tip button



About the author
MSSQLTips author Matteo Lorini Matteo Lorini is a DBA and has been working in IT since 1993. He specializes in SQL Server and also has knowledge of MySQL.

View all my tips





Comments For This Article





download


Recommended Reading

Using Simple Linear Regression to Make Predictions

Subset in R including List, Data Frame, Matrix and Vector





get free sql tips
agree to terms


Learn more about SQL Server tools