By: Matteo Lorini | Comments | Related: More > R Language
Problem
I have loaded data into an R Data Frame or any other type of data structure; what are my options to extract, manipulate and work with my data?
Solution
R exposes a range of powerful and fast subsetting operations. Subsetting operations can be hard to learn, and they can be non-intuitive; however, learning how to subset R data is crucial to manipulate data.
In this article we will examine subsetting operators, types of subsetting, differences in behavior for different R objects like vectors, lists, and data frames.
Atomic Vectors
Let's start with the easiest subsetting type of data structure in R that are Atomic Vectors. We will examine it by using a simple example of numeric vector.
# Subsetting x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
Elements of the vector are in order position, for example, value 5.5 is at position five in the vector. We can access a single element by using [], let see how it works with an example.
# Get element at position 5 x[5]
Using SSMS
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Subsetting x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1) # Get element at position 5 print(x[5]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
Get Specific Elements
If we want to access elements at position 1, 3 and 8 of our vector x, we use the following command.
# Get elements at positions 1,3 and 8 x[c(1,3,8)]
Please note that the subsetting operation returns a Vector data type therefore; I had to use the c() command to combine the output results.
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Subsetting x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1) # Get elements at positions 1,3 and 8 print(x[c(1,3,8)]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
Omit Values from a Vector
We can use – (negative sign) to omit the value from a vector. The following command will return all the values of vector x except the one at position 3 and 1.
# Omit elements at position 3,1 x[-c(3,1)]
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Subsetting x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1) # Omit elements at position 3,1 print(x[-c(3,1)]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
Order Vector Elements
Order() function is used to order the Vector elements.
# Order a vector y <- c(10,1,7,-3,8) y y[order(y)]
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Subsetting # Order a vector y <- c(10,1,7,-3,8) print(y) # Order Vector y print(y[order(y)]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
Logical Vector
Subsetting can also be done using a logical vector for example, if we want to list element 1, 2 and 5, we can write the following subsetting logical vector.
#Subsetting using logical vector y # Return elements which position correspond to TRUE y[c(TRUE, TRUE, FALSE, FALSE, TRUE)]
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' y <- c(10,1,7,-3,8) # Return elements which position correspond to TRUE print(y[c(TRUE, TRUE, FALSE, FALSE, TRUE)]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
Filter Elements of a Vector
We can use logical operators like >, < and == to filter elements of a vector.
#List elements greater than 4 y[y>4] #List elements less than 4 y[y<4] #List elements equal to 4 y[y==7]
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' y <- c(10,1,7,-3,8) #List elements greater than 4 print(y[y>4]) #List elements less than 4 print(y[y<4]) #List elements equal to 4 print(y[y==7]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
Assign Names to Elements in a Vector
Another interesting part is to assign names to elements in a Vector and filter the output based on names.
# Assign names to an element vector w <- setNames(x, letters[1:10]) #Display all Elements w # Select elements corresponding to letter b and h w[c("b","h")]
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1) # Assign names to an element vector w <- setNames(x, letters[1:10]) #Display all Elements print(w) # Select elements corresponding to letter b and h print(w[c("b","h")]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
List
Subsetting List operations are like the ones used for an Atomic Vector with the difference that [] always returns a list while [[]] operators returns a component of the List. Let's see it with examples.
# Create a new List v1 = c(20, 30, 50) v2 = c("www", "mssql", "tips", ".", "com") v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE) myList = list(v1, v2, v3, 99) # display List myList
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Create a new List v1 = c(20, 30, 50) v2 = c("www", "mssql", "tips", ".", "com") v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE) myList = list(v1, v2, v3, 99) # display List print(myList) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
Let's use [] to retrieve the second element of the list and [[]] to get the element "tips".
# Get 2nd element of the List myList[2] # Get the 3rd Value of the myList[[2]][3]
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Create a new List v1 = c(20, 30, 50) v2 = c("www", "mssql", "tips", ".", "com") v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE) myList = list(v1, v2, v3, 99) # Get 2nd element of the List print(myList[2]) # Get the 3rd Value of the print(myList[[2]][3]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
Matrix
The simple way to subset a Matrix is to use an index for each dimension. Let's see some examples of how to work with a Matrix.
# Create a 3X3 Matrix myMatrix <- matrix(1:9, nrow = 3) # Assign names to the columns colnames(myMatrix) <- c("A", "B", "C") #Assign names to the rows rownames(myMatrix) <- c("X", "Y", "Z") # Display the Matrix myMatrix
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Create a 3X3 Matrix myMatrix <- matrix(1:9, nrow = 3) # Assign names to the columns colnames(myMatrix) <- c("A", "B", "C") #Assign names to the rows rownames(myMatrix) <- c("X", "Y", "Z") # Display the Matrix print(myMatrix) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
Now that we have created our Matrix, let's see how we can access its elements.
# Access Element 5 at coordinate 2,2 myMatrix[2,2] myMatrix["Y","B"] # Access Element 8 at coordinate 2,3 myMatrix[2,3] myMatrix["Y","C"]
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Create a 3X3 Matrix myMatrix <- matrix(1:9, nrow = 3) # Assign names to the columns colnames(myMatrix) <- c("A", "B", "C") #Assign names to the rows rownames(myMatrix) <- c("X", "Y", "Z") # Display the Matrix print(myMatrix) print("Access Element 5 at coordinate 2,2") print(myMatrix[2,2]) print(myMatrix["Y","B"]) print("Access Element 8 at coordinate 2,3") print(myMatrix[2,3]) print(myMatrix["Y","C"]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
A Matrix can only have 2 dimensions, a three or more dimension data structure is an Array. Let's define a 2X5X4 Array and see how to assign and retrieve a value from it.
# Create a multi dimensioanl arraymyarr = array(0.0, c(2,5,4)) # 2x5x4 n-array print(myarr) # 40 values displayed #Assign a Value to location 2,3,4 myarr[2,3,4] <- 1 #D#Display Value at location 2,3,4 print(myarr)print(myarr[2,3,4])
DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' # Create a multi dimensioanl array myarr = array(0.0, c(2,5,4)) # 2x5x4 n-array print(myarr) # 40 values displayed #Assign a Value to location 2,3,4 myarr[2,3,4] <- 1 print(myarr) #Display Value at location 2,3,4 print(myarr[2,3,4]) '; EXEC sp_execute_external_script @language = N'R', @script = @rscript; GO
Data Frames
Data Frames play an important role in Data Science, let's see with an example how we can create and subsetting a simple Data Frame.
# Define a Data Frame of mix Numbers and Letters df <- data.frame(x = 1:5, z = letters[1:5],w = letters[6:10], y = 5:1 ) # Display the data frame contents print(df)
print(df) print("Display a row that have c as element in column z ") df[df$z == "c", ]
print(df) print("Display a Single Value ") df[df$z == "c", "y"]
prprint("Display 1st and 5th rows") df[c(1, 5), ]prprint("Display 2nd and 3th columns") df[,c(2, 3)]
Conclusion
In the tip we have learned subsetting of the main R data types. In the next tip we will see subsetting and assignment, lookup table, matching, merging and other R commands.
Next Steps
- The reader will need to install RStudio in order to test this tip.
- Check out these tips
- Getting started with R
- Quick Start Guide for Data Science with SQL Server and R Services
- Data exploration with R
- SQL Server data access Using R
- R with T-SQL
- AdventureWork2014
- Export SQL Server table to Excel
- sp_execute_external_script
- Data Structures in R including Vector, Matrix, Array, List, and Data Frame
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips