Subset in R including List, Data Frame, Matrix and Vector
By: Matteo Lorini | Comments | Related: More > R Language
Problem
I have loaded data into an R Data Frame or any other type of data structure; what are my options to extract, manipulate and work with my data?
Solution
R exposes a range of powerful and fast subsetting operations. Subsetting operations can be hard to learn, and they can be non-intuitive; however, learning how to subset R data is crucial to manipulate data.
In this article we will examine subsetting operators, types of subsetting, differences in behavior for different R objects like vectors, lists, and data frames.
Atomic Vectors
Let's start with the easiest subsetting type of data structure in R that are Atomic Vectors. We will examine it by using a simple example of numeric vector.
# Subsetting x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
Elements of the vector are in order position, for example, value 5.5 is at position five in the vector. We can access a single element by using [], let see how it works with an example.
# Get element at position 5 x[5]

Using SSMS
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Subsetting x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1) # Get element at position 5 print(x[5]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

Get Specific Elements
If we want to access elements at position 1, 3 and 8 of our vector x, we use the following command.
# Get elements at positions 1,3 and 8 x[c(1,3,8)]
Please note that the subsetting operation returns a Vector data type therefore; I had to use the c() command to combine the output results.

DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Subsetting x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1) # Get elements at positions 1,3 and 8 print(x[c(1,3,8)]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

Omit Values from a Vector
We can use – (negative sign) to omit the value from a vector. The following command will return all the values of vector x except the one at position 3 and 1.
# Omit elements at position 3,1 x[-c(3,1)]

DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Subsetting x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1) # Omit elements at position 3,1 print(x[-c(3,1)]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

Order Vector Elements
Order() function is used to order the Vector elements.
# Order a vector y <- c(10,1,7,-3,8) y y[order(y)]

DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Subsetting # Order a vector y <- c(10,1,7,-3,8) print(y) # Order Vector y print(y[order(y)]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

Logical Vector
Subsetting can also be done using a logical vector for example, if we want to list element 1, 2 and 5, we can write the following subsetting logical vector.
#Subsetting using logical vector y # Return elements which position correspond to TRUE y[c(TRUE, TRUE, FALSE, FALSE, TRUE)]

DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' y <- c(10,1,7,-3,8) # Return elements which position correspond to TRUE print(y[c(TRUE, TRUE, FALSE, FALSE, TRUE)]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

Filter Elements of a Vector
We can use logical operators like >, < and == to filter elements of a vector.
#List elements greater than 4 y[y>4] #List elements less than 4 y[y<4] #List elements equal to 4 y[y==7]

DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' y <- c(10,1,7,-3,8) #List elements greater than 4 print(y[y>4]) #List elements less than 4 print(y[y<4]) #List elements equal to 4 print(y[y==7]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

Assign Names to Elements in a Vector
Another interesting part is to assign names to elements in a Vector and filter the output based on names.
# Assign names to an element vector w <- setNames(x, letters[1:10]) #Display all Elements w # Select elements corresponding to letter b and h w[c("b","h")]

DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1) # Assign names to an element vector w <- setNames(x, letters[1:10]) #Display all Elements print(w) # Select elements corresponding to letter b and h print(w[c("b","h")]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

List
Subsetting List operations are like the ones used for an Atomic Vector with the difference that [] always returns a list while [[]] operators returns a component of the List. Let's see it with examples.
# Create a new List v1 = c(20, 30, 50) v2 = c("www", "mssql", "tips", ".", "com") v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE) myList = list(v1, v2, v3, 99) # display List myList

DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Create a new List v1 = c(20, 30, 50) v2 = c("www", "mssql", "tips", ".", "com") v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE) myList = list(v1, v2, v3, 99) # display List print(myList) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

Let's use [] to retrieve the second element of the list and [[]] to get the element "tips".
# Get 2nd element of the List myList[2] # Get the 3rd Value of the myList[[2]][3]

DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Create a new List v1 = c(20, 30, 50) v2 = c("www", "mssql", "tips", ".", "com") v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE) myList = list(v1, v2, v3, 99) # Get 2nd element of the List print(myList[2]) # Get the 3rd Value of the print(myList[[2]][3]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

Matrix
The simple way to subset a Matrix is to use an index for each dimension. Let's see some examples of how to work with a Matrix.
# Create a 3X3 Matrix myMatrix <- matrix(1:9, nrow = 3) # Assign names to the columns colnames(myMatrix) <- c("A", "B", "C") #Assign names to the rows rownames(myMatrix) <- c("X", "Y", "Z") # Display the Matrix myMatrix

DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Create a 3X3 Matrix myMatrix <- matrix(1:9, nrow = 3) # Assign names to the columns colnames(myMatrix) <- c("A", "B", "C") #Assign names to the rows rownames(myMatrix) <- c("X", "Y", "Z") # Display the Matrix print(myMatrix) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

Now that we have created our Matrix, let's see how we can access its elements.
# Access Element 5 at coordinate 2,2 myMatrix[2,2] myMatrix["Y","B"] # Access Element 8 at coordinate 2,3 myMatrix[2,3] myMatrix["Y","C"]

DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Create a 3X3 Matrix myMatrix <- matrix(1:9, nrow = 3) # Assign names to the columns colnames(myMatrix) <- c("A", "B", "C") #Assign names to the rows rownames(myMatrix) <- c("X", "Y", "Z") # Display the Matrix print(myMatrix) print("Access Element 5 at coordinate 2,2") print(myMatrix[2,2]) print(myMatrix["Y","B"]) print("Access Element 8 at coordinate 2,3") print(myMatrix[2,3]) print(myMatrix["Y","C"]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

A Matrix can only have 2 dimensions, a three or more dimension data structure is an Array. Let's define a 2X5X4 Array and see how to assign and retrieve a value from it.
# Create a multi dimensioanl arraymyarr = array(0.0, c(2,5,4)) # 2x5x4 n-array print(myarr) # 40 values displayed #Assign a Value to location 2,3,4 myarr[2,3,4] <- 1 #D#Display Value at location 2,3,4 print(myarr)print(myarr[2,3,4])


DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Create a multi dimensioanl array myarr = array(0.0, c(2,5,4)) # 2x5x4 n-array print(myarr) # 40 values displayed #Assign a Value to location 2,3,4 myarr[2,3,4] <- 1 print(myarr) #Display Value at location 2,3,4 print(myarr[2,3,4]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO

Data Frames
Data Frames play an important role in Data Science, let's see with an example how we can create and subsetting a simple Data Frame.
# Define a Data Frame of mix Numbers and Letters df <- data.frame(x = 1:5, z = letters[1:5],w = letters[6:10], y = 5:1 ) # Display the data frame contents print(df)

print(df) print("Display a row that have c as element in column z ") df[df$z == "c", ]

print(df) print("Display a Single Value ") df[df$z == "c", "y"]

prprint("Display 1st and 5th rows") df[c(1, 5), ]prprint("Display 2nd and 3th columns") df[,c(2, 3)]

Conclusion
In the tip we have learned subsetting of the main R data types. In the next tip we will see subsetting and assignment, lookup table, matching, merging and other R commands.
Next Steps
- The reader will need to install RStudio in order to test this tip.
- Check out these tips
- Getting started with R
- Quick Start Guide for Data Science with SQL Server and R Services
- Data exploration with R
- SQL Server data access Using R
- R with T-SQL
- AdventureWork2014
- Export SQL Server table to Excel
- sp_execute_external_script
- Data Structures in R including Vector, Matrix, Array, List, and Data Frame
About the author

View all my tips