Subset in R including List, Data Frame, Matrix and Vector

By:   |   Comments   |   Related: More > R Language


Problem

I have loaded data into an R Data Frame or any other type of data structure; what are my options to extract, manipulate and work with my data?

Solution

R exposes a range of powerful and fast subsetting operations. Subsetting operations can be hard to learn, and they can be non-intuitive; however, learning how to subset R data is crucial to manipulate data.

In this article we will examine subsetting operators, types of subsetting, differences in behavior for different R objects like vectors, lists, and data frames.

Atomic Vectors

Let's start with the easiest subsetting type of data structure in R that are Atomic Vectors. We will examine it by using a simple example of numeric vector.

# Subsetting
x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)

Elements of the vector are in order position, for example, value 5.5 is at position five in the vector. We can access a single element by using [], let see how it works with an example.

# Get element at position 5
x[5]
result set

Using SSMS

DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Subsetting
    x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
    # Get element at position 5
    print(x[5])
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Get Specific Elements

If we want to access elements at position 1, 3 and 8 of our vector x, we use the following command.

# Get elements at positions 1,3 and 8
x[c(1,3,8)]

Please note that the subsetting operation returns a Vector data type therefore; I had to use the c() command to combine the output results.

result set
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Subsetting
    x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
    # Get elements at positions 1,3 and 8
    print(x[c(1,3,8)])
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Omit Values from a Vector

We can use – (negative sign) to omit the value from a vector. The following command will return all the values of vector x except the one at position 3 and 1.

# Omit elements at position 3,1
x[-c(3,1)]
result set
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Subsetting
    x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
    # Omit elements at position 3,1
    print(x[-c(3,1)])
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Order Vector Elements

Order() function is used to order the Vector elements.

# Order a vector
y <- c(10,1,7,-3,8)
y
y[order(y)] 
result set
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Subsetting
    # Order a vector
    y <- c(10,1,7,-3,8)
    print(y)
    # Order Vector y
    print(y[order(y)])
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Logical Vector

Subsetting can also be done using a logical vector for example, if we want to list element 1, 2 and 5, we can write the following subsetting logical vector.

#Subsetting using logical vector
y
#  Return elements which position correspond to TRUE
y[c(TRUE, TRUE, FALSE, FALSE, TRUE)]
result set
DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
    y <- c(10,1,7,-3,8)
    #  Return elements which position correspond to TRUE
    print(y[c(TRUE, TRUE, FALSE, FALSE, TRUE)])
   ';
EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
GO	
result set

Filter Elements of a Vector

We can use logical operators like >, < and == to filter elements of a vector.

#List elements greater than 4
y[y>4]
#List elements less than 4
y[y<4]
#List elements equal to 4
y[y==7]
result set
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    y <- c(10,1,7,-3,8)
    #List elements greater than 4
    print(y[y>4])
    #List elements less than 4
    print(y[y<4])
    #List elements equal to 4
    print(y[y==7])
   ';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Assign Names to Elements in a Vector

Another interesting part is to assign names to elements in a Vector and filter the output based on names.

# Assign names to an element vector
w <- setNames(x, letters[1:10])
#Display all Elements
w
# Select elements corresponding to letter b and h
w[c("b","h")]
result set
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
    # Assign names to an element vector
    w <- setNames(x, letters[1:10])
    #Display all Elements
    print(w)
    # Select elements corresponding to letter b and h
    print(w[c("b","h")])
   ';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

List

Subsetting List operations are like the ones used for an Atomic Vector with the difference that [] always returns a list while [[]] operators returns a component of the List. Let's see it with examples.

# Create a new List 
v1 = c(20, 30, 50) 
v2 = c("www", "mssql", "tips", ".", "com") 
v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE) 
myList = list(v1, v2, v3, 99)
# display List
myList
result set
  DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Create a new List 
    v1 = c(20, 30, 50) 
    v2 = c("www", "mssql", "tips", ".", "com") 
    v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE) 
    myList = list(v1, v2, v3, 99)
    # display List
    print(myList)
   ';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Let's use [] to retrieve the second element of the list and [[]] to get the element "tips".

# Get 2nd element of the List
myList[2]
# Get the 3rd Value of the 
myList[[2]][3]
result set
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Create a new List 
    v1 = c(20, 30, 50) 
    v2 = c("www", "mssql", "tips", ".", "com") 
    v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE) 
    myList = list(v1, v2, v3, 99)
    # Get 2nd element of the List
    print(myList[2])
    # Get the 3rd Value of the 
    print(myList[[2]][3])
   ';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Matrix

The simple way to subset a Matrix is to use an index for each dimension. Let's see some examples of how to work with a Matrix.

# Create a 3X3 Matrix
myMatrix <- matrix(1:9, nrow = 3)
# Assign names to the columns
colnames(myMatrix) <- c("A", "B", "C")
#Assign names to the rows
rownames(myMatrix) <- c("X", "Y", "Z")
# Display the Matrix
myMatrix
result set
  DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Create a 3X3 Matrix
   myMatrix <- matrix(1:9, nrow = 3)
    # Assign names to the columns
    colnames(myMatrix) <- c("A", "B", "C")
    #Assign names to the rows
    rownames(myMatrix) <- c("X", "Y", "Z")
    # Display the Matrix
    print(myMatrix)
   ';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Now that we have created our Matrix, let's see how we can access its elements.

# Access Element 5 at coordinate 2,2
myMatrix[2,2]
myMatrix["Y","B"]
# Access Element 8 at coordinate 2,3
myMatrix[2,3]
myMatrix["Y","C"]
result set
DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
  # Create a 3X3 Matrix
  myMatrix <- matrix(1:9, nrow = 3)
  # Assign names to the columns
  colnames(myMatrix) <- c("A", "B", "C")
  #Assign names to the rows
  rownames(myMatrix) <- c("X", "Y", "Z")
   # Display the Matrix
  print(myMatrix)
  print("Access Element 5 at coordinate 2,2")
  print(myMatrix[2,2])
  print(myMatrix["Y","B"])
  print("Access Element 8 at coordinate 2,3")
  print(myMatrix[2,3])
  print(myMatrix["Y","C"])
  ';
EXEC sp_execute_external_script  @language = N'R',
  @script = @rscript;  
GO
result set

A Matrix can only have 2 dimensions, a three or more dimension data structure is an Array. Let's define a 2X5X4 Array and see how to assign and retrieve a value from it.

# Create a multi dimensioanl arraymyarr = array(0.0, c(2,5,4)) # 2x5x4 n-array
print(myarr)  # 40 values displayed
#Assign a Value to location 2,3,4
myarr[2,3,4] <- 1
#D#Display Value at location 2,3,4
print(myarr)print(myarr[2,3,4])
result set
result set
  DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Create a multi dimensioanl array
    myarr = array(0.0, c(2,5,4)) # 2x5x4 n-array
    print(myarr)  # 40 values displayed
    #Assign a Value to location 2,3,4
    myarr[2,3,4] <- 1
    print(myarr)
      #Display Value at location 2,3,4
    print(myarr[2,3,4])
   ';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Data Frames

Data Frames play an important role in Data Science, let's see with an example how we can create and subsetting a simple Data Frame.

# Define a Data Frame of mix Numbers and Letters
df <- data.frame(x = 1:5, z = letters[1:5],w = letters[6:10], y = 5:1 )
# Display the data frame contents
print(df)
result set
print(df)
print("Display a row that have c as element in column z ")
df[df$z == "c", ]
result set
print(df)
print("Display a Single Value ")
df[df$z == "c", "y"]
result set
prprint("Display 1st and 5th rows")
df[c(1, 5), ]prprint("Display 2nd and 3th columns")
df[,c(2, 3)]
result set

Conclusion

In the tip we have learned subsetting of the main R data types. In the next tip we will see subsetting and assignment, lookup table, matching, merging and other R commands.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Matteo Lorini Matteo Lorini is a DBA and has been working in IT since 1993. He specializes in SQL Server and also has knowledge of MySQL.

View all my tips



Comments For This Article