Problem
I have loaded data into an R Data Frame or any other type of data structure; what are my options to extract, manipulate and work with my data?
Solution
R exposes a range of powerful and fast subsetting operations. Subsetting operations can be hard to learn, and they can be non-intuitive; however, learning how to subset R data is crucial to manipulate data.
In this article we will examine subsetting operators, types of subsetting, differences in behavior for different R objects like vectors, lists, and data frames.
Atomic Vectors
Let’s start with the easiest subsetting type of data structure in R that are Atomic Vectors. We will examine it by using a simple example of numeric vector.
# Subsetting x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
Elements of the vector are in order position, for example, value 5.5 is at position five in the vector. We can access a single element by using [], let see how it works with an example.
# Get element at position 5 x[5]

Using SSMS
DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
# Subsetting
x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
# Get element at position 5
print(x[5])
';
EXEC sp_execute_external_script
@language = N'R',
@script = @rscript;
GO

Get Specific Elements
If we want to access elements at position 1, 3 and 8 of our vector x, we use the following command.
# Get elements at positions 1,3 and 8 x[c(1,3,8)]
Please note that the subsetting operation returns a Vector data type therefore; I had to use the c() command to combine the output results.

DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
# Subsetting
x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
# Get elements at positions 1,3 and 8
print(x[c(1,3,8)])
';
EXEC sp_execute_external_script
@language = N'R',
@script = @rscript;
GO

Omit Values from a Vector
We can use – (negative sign) to omit the value from a vector. The following command will return all the values of vector x except the one at position 3 and 1.
# Omit elements at position 3,1 x[-c(3,1)]

DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
# Subsetting
x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
# Omit elements at position 3,1
print(x[-c(3,1)])
';
EXEC sp_execute_external_script
@language = N'R',
@script = @rscript;
GO

Order Vector Elements
Order() function is used to order the Vector elements.
# Order a vector y <- c(10,1,7,-3,8) y y[order(y)]

DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
# Subsetting
# Order a vector
y <- c(10,1,7,-3,8)
print(y)
# Order Vector y
print(y[order(y)])
';
EXEC sp_execute_external_script
@language = N'R',
@script = @rscript;
GO

Logical Vector
Subsetting can also be done using a logical vector for example, if we want to list element 1, 2 and 5, we can write the following subsetting logical vector.
#Subsetting using logical vector y # Return elements which position correspond to TRUE y[c(TRUE, TRUE, FALSE, FALSE, TRUE)]

DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
y <- c(10,1,7,-3,8)
# Return elements which position correspond to TRUE
print(y[c(TRUE, TRUE, FALSE, FALSE, TRUE)])
';
EXEC sp_execute_external_script
@language = N'R',
@script = @rscript;
GO
Filter Elements of a Vector
We can use logical operators like >, < and == to filter elements of a vector.
#List elements greater than 4 y[y>4] #List elements less than 4 y[y<4] #List elements equal to 4 y[y==7]

DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
y <- c(10,1,7,-3,8)
#List elements greater than 4
print(y[y>4])
#List elements less than 4
print(y[y<4])
#List elements equal to 4
print(y[y==7])
';
EXEC sp_execute_external_script
@language = N'R',
@script = @rscript;
GO

Assign Names to Elements in a Vector
Another interesting part is to assign names to elements in a Vector and filter the output based on names.
# Assign names to an element vector
w <- setNames(x, letters[1:10])
#Display all Elements
w
# Select elements corresponding to letter b and h
w[c("b","h")]

DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
# Assign names to an element vector
w <- setNames(x, letters[1:10])
#Display all Elements
print(w)
# Select elements corresponding to letter b and h
print(w[c("b","h")])
';
EXEC sp_execute_external_script
@language = N'R',
@script = @rscript;
GO

List
Subsetting List operations are like the ones used for an Atomic Vector with the difference that [] always returns a list while [[]] operators returns a component of the List. Let’s see it with examples.
# Create a new List
v1 = c(20, 30, 50)
v2 = c("www", "mssql", "tips", ".", "com")
v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE)
myList = list(v1, v2, v3, 99)
# display List
myList

DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
# Create a new List
v1 = c(20, 30, 50)
v2 = c("www", "mssql", "tips", ".", "com")
v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE)
myList = list(v1, v2, v3, 99)
# display List
print(myList)
';
EXEC sp_execute_external_script
@language = N'R',
@script = @rscript;
GO

Let’s use [] to retrieve the second element of the list and [[]] to get the element “tips”.
# Get 2nd element of the List myList[2] # Get the 3rd Value of the myList[[2]][3]

DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
# Create a new List
v1 = c(20, 30, 50)
v2 = c("www", "mssql", "tips", ".", "com")
v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE)
myList = list(v1, v2, v3, 99)
# Get 2nd element of the List
print(myList[2])
# Get the 3rd Value of the
print(myList[[2]][3])
';
EXEC sp_execute_external_script
@language = N'R',
@script = @rscript;
GO

Matrix
The simple way to subset a Matrix is to use an index for each dimension. Let’s see some examples of how to work with a Matrix.
# Create a 3X3 Matrix
myMatrix <- matrix(1:9, nrow = 3)
# Assign names to the columns
colnames(myMatrix) <- c("A", "B", "C")
#Assign names to the rows
rownames(myMatrix) <- c("X", "Y", "Z")
# Display the Matrix
myMatrix

DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
# Create a 3X3 Matrix
myMatrix <- matrix(1:9, nrow = 3)
# Assign names to the columns
colnames(myMatrix) <- c("A", "B", "C")
#Assign names to the rows
rownames(myMatrix) <- c("X", "Y", "Z")
# Display the Matrix
print(myMatrix)
';
EXEC sp_execute_external_script
@language = N'R',
@script = @rscript;
GO

Now that we have created our Matrix, let’s see how we can access its elements.
# Access Element 5 at coordinate 2,2 myMatrix[2,2] myMatrix["Y","B"] # Access Element 8 at coordinate 2,3 myMatrix[2,3] myMatrix["Y","C"]

DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
# Create a 3X3 Matrix
myMatrix <- matrix(1:9, nrow = 3)
# Assign names to the columns
colnames(myMatrix) <- c("A", "B", "C")
#Assign names to the rows
rownames(myMatrix) <- c("X", "Y", "Z")
# Display the Matrix
print(myMatrix)
print("Access Element 5 at coordinate 2,2")
print(myMatrix[2,2])
print(myMatrix["Y","B"])
print("Access Element 8 at coordinate 2,3")
print(myMatrix[2,3])
print(myMatrix["Y","C"])
';
EXEC sp_execute_external_script @language = N'R',
@script = @rscript;
GO

A Matrix can only have 2 dimensions, a three or more dimension data structure is an Array. Let’s define a 2X5X4 Array and see how to assign and retrieve a value from it.
# Create a multi dimensioanl arraymyarr = array(0.0, c(2,5,4)) # 2x5x4 n-array print(myarr) # 40 values displayed #Assign a Value to location 2,3,4 myarr[2,3,4] <- 1 #D#Display Value at location 2,3,4 print(myarr)print(myarr[2,3,4])


DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
# Create a multi dimensioanl array
myarr = array(0.0, c(2,5,4)) # 2x5x4 n-array
print(myarr) # 40 values displayed
#Assign a Value to location 2,3,4
myarr[2,3,4] <- 1
print(myarr)
#Display Value at location 2,3,4
print(myarr[2,3,4])
';
EXEC sp_execute_external_script
@language = N'R',
@script = @rscript;
GO

Data Frames
Data Frames play an important role in Data Science, let’s see with an example how we can create and subsetting a simple Data Frame.
# Define a Data Frame of mix Numbers and Letters df <- data.frame(x = 1:5, z = letters[1:5],w = letters[6:10], y = 5:1 ) # Display the data frame contents print(df)

print(df)
print("Display a row that have c as element in column z ")
df[df$z == "c", ]

print(df)
print("Display a Single Value ")
df[df$z == "c", "y"]

prprint("Display 1st and 5th rows")
df[c(1, 5), ]prprint("Display 2nd and 3th columns")
df[,c(2, 3)]
Conclusion
In the tip we have learned subsetting of the main R data types. In the next tip we will see subsetting and assignment, lookup table, matching, merging and other R commands.
Next Steps
- The reader will need to install RStudio in order to test this tip.
- Check out these tips
- Getting started with R
- Quick Start Guide for Data Science with SQL Server and R Services
- Data exploration with R
- SQL Server data access Using R
- R with T-SQL
- AdventureWork2014
- Export SQL Server table to Excel
- sp_execute_external_script
- Data Structures in R including Vector, Matrix, Array, List, and Data Frame

Matteo is currently a DBA for one of the largest global electronic payment service providers in the U.S. He has been working in IT since he graduated from college in 1993. Matteo has a Master’s degree in Computer Science, and specializes in MSSQL with a good working knowledge of MySQL as well.
- MSSQLTips Awards: Rising Star (50+ tips) – 2021 | Author Contender – 2020