Subset in R including List, Data Frame, Matrix and Vector


By:   |   Updated: 2020-08-06   |   Comments   |   Related: More > R Language


Problem

I have loaded data into an R Data Frame or any other type of data structure; what are my options to extract, manipulate and work with my data?

Solution

R exposes a range of powerful and fast subsetting operations. Subsetting operations can be hard to learn, and they can be non-intuitive; however, learning how to subset R data is crucial to manipulate data.

In this article we will examine subsetting operators, types of subsetting, differences in behavior for different R objects like vectors, lists, and data frames.

Atomic Vectors

Let's start with the easiest subsetting type of data structure in R that are Atomic Vectors. We will examine it by using a simple example of numeric vector.

# Subsetting
x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)

Elements of the vector are in order position, for example, value 5.5 is at position five in the vector. We can access a single element by using [], let see how it works with an example.

# Get element at position 5
x[5]
result set

Using SSMS

DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Subsetting
    x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
    # Get element at position 5
    print(x[5])
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Get Specific Elements

If we want to access elements at position 1, 3 and 8 of our vector x, we use the following command.

# Get elements at positions 1,3 and 8
x[c(1,3,8)]

Please note that the subsetting operation returns a Vector data type therefore; I had to use the c() command to combine the output results.

result set
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Subsetting
    x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
    # Get elements at positions 1,3 and 8
    print(x[c(1,3,8)])
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Omit Values from a Vector

We can use – (negative sign) to omit the value from a vector. The following command will return all the values of vector x except the one at position 3 and 1.

# Omit elements at position 3,1
x[-c(3,1)]
result set
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Subsetting
    x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
    # Omit elements at position 3,1
    print(x[-c(3,1)])
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Order Vector Elements

Order() function is used to order the Vector elements.

# Order a vector
y <- c(10,1,7,-3,8)
y
y[order(y)] 
result set
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Subsetting
    # Order a vector
    y <- c(10,1,7,-3,8)
    print(y)
    # Order Vector y
    print(y[order(y)])
';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Logical Vector

Subsetting can also be done using a logical vector for example, if we want to list element 1, 2 and 5, we can write the following subsetting logical vector.

#Subsetting using logical vector
y
#  Return elements which position correspond to TRUE
y[c(TRUE, TRUE, FALSE, FALSE, TRUE)]
result set
DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
    y <- c(10,1,7,-3,8)
    #  Return elements which position correspond to TRUE
    print(y[c(TRUE, TRUE, FALSE, FALSE, TRUE)])
   ';
EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
GO	
result set

Filter Elements of a Vector

We can use logical operators like >, < and == to filter elements of a vector.

#List elements greater than 4
y[y>4]
#List elements less than 4
y[y<4]
#List elements equal to 4
y[y==7]
result set
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    y <- c(10,1,7,-3,8)
    #List elements greater than 4
    print(y[y>4])
    #List elements less than 4
    print(y[y<4])
    #List elements equal to 4
    print(y[y==7])
   ';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Assign Names to Elements in a Vector

Another interesting part is to assign names to elements in a Vector and filter the output based on names.

# Assign names to an element vector
w <- setNames(x, letters[1:10])
#Display all Elements
w
# Select elements corresponding to letter b and h
w[c("b","h")]
result set
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    x <- c(1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9, 10.1)
    # Assign names to an element vector
    w <- setNames(x, letters[1:10])
    #Display all Elements
    print(w)
    # Select elements corresponding to letter b and h
    print(w[c("b","h")])
   ';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

List

Subsetting List operations are like the ones used for an Atomic Vector with the difference that [] always returns a list while [[]] operators returns a component of the List. Let's see it with examples.

# Create a new List 
v1 = c(20, 30, 50) 
v2 = c("www", "mssql", "tips", ".", "com") 
v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE) 
myList = list(v1, v2, v3, 99)
# display List
myList
result set
  DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Create a new List 
    v1 = c(20, 30, 50) 
    v2 = c("www", "mssql", "tips", ".", "com") 
    v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE) 
    myList = list(v1, v2, v3, 99)
    # display List
    print(myList)
   ';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Let's use [] to retrieve the second element of the list and [[]] to get the element "tips".

# Get 2nd element of the List
myList[2]
# Get the 3rd Value of the 
myList[[2]][3]
result set
DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Create a new List 
    v1 = c(20, 30, 50) 
    v2 = c("www", "mssql", "tips", ".", "com") 
    v3 = c(TRUE, FALSE, TRUE, FALSE, FALSE) 
    myList = list(v1, v2, v3, 99)
    # Get 2nd element of the List
    print(myList[2])
    # Get the 3rd Value of the 
    print(myList[[2]][3])
   ';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Matrix

The simple way to subset a Matrix is to use an index for each dimension. Let's see some examples of how to work with a Matrix.

# Create a 3X3 Matrix
myMatrix <- matrix(1:9, nrow = 3)
# Assign names to the columns
colnames(myMatrix) <- c("A", "B", "C")
#Assign names to the rows
rownames(myMatrix) <- c("X", "Y", "Z")
# Display the Matrix
myMatrix
result set
  DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Create a 3X3 Matrix
   myMatrix <- matrix(1:9, nrow = 3)
    # Assign names to the columns
    colnames(myMatrix) <- c("A", "B", "C")
    #Assign names to the rows
    rownames(myMatrix) <- c("X", "Y", "Z")
    # Display the Matrix
    print(myMatrix)
   ';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Now that we have created our Matrix, let's see how we can access its elements.

# Access Element 5 at coordinate 2,2
myMatrix[2,2]
myMatrix["Y","B"]
# Access Element 8 at coordinate 2,3
myMatrix[2,3]
myMatrix["Y","C"]
result set
DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
  # Create a 3X3 Matrix
  myMatrix <- matrix(1:9, nrow = 3)
  # Assign names to the columns
  colnames(myMatrix) <- c("A", "B", "C")
  #Assign names to the rows
  rownames(myMatrix) <- c("X", "Y", "Z")
   # Display the Matrix
  print(myMatrix)
  print("Access Element 5 at coordinate 2,2")
  print(myMatrix[2,2])
  print(myMatrix["Y","B"])
  print("Access Element 8 at coordinate 2,3")
  print(myMatrix[2,3])
  print(myMatrix["Y","C"])
  ';
EXEC sp_execute_external_script  @language = N'R',
  @script = @rscript;  
GO
result set

A Matrix can only have 2 dimensions, a three or more dimension data structure is an Array. Let's define a 2X5X4 Array and see how to assign and retrieve a value from it.

# Create a multi dimensioanl arraymyarr = array(0.0, c(2,5,4)) # 2x5x4 n-array
print(myarr)  # 40 values displayed
#Assign a Value to location 2,3,4
myarr[2,3,4] <- 1
#D#Display Value at location 2,3,4
print(myarr)print(myarr[2,3,4])
result set
result set
  DECLARE @rscript NVARCHAR(MAX);
  SET @rscript = N'
    # Create a multi dimensioanl array
    myarr = array(0.0, c(2,5,4)) # 2x5x4 n-array
    print(myarr)  # 40 values displayed
    #Assign a Value to location 2,3,4
    myarr[2,3,4] <- 1
    print(myarr)
      #Display Value at location 2,3,4
    print(myarr[2,3,4])
   ';
  EXEC sp_execute_external_script
    @language = N'R',
    @script = @rscript;  
  GO
result set

Data Frames

Data Frames play an important role in Data Science, let's see with an example how we can create and subsetting a simple Data Frame.

# Define a Data Frame of mix Numbers and Letters
df <- data.frame(x = 1:5, z = letters[1:5],w = letters[6:10], y = 5:1 )
# Display the data frame contents
print(df)
result set
print(df)
print("Display a row that have c as element in column z ")
df[df$z == "c", ]
result set
print(df)
print("Display a Single Value ")
df[df$z == "c", "y"]
result set
prprint("Display 1st and 5th rows")
df[c(1, 5), ]prprint("Display 2nd and 3th columns")
df[,c(2, 3)]
result set

Conclusion

In the tip we have learned subsetting of the main R data types. In the next tip we will see subsetting and assignment, lookup table, matching, merging and other R commands.

Next Steps


Last Updated: 2020-08-06


get scripts

next tip button



About the author
MSSQLTips author Matteo Lorini Matteo Lorini is a DBA and has been working in IT since 1993. He specializes in SQL Server and also has knowledge of MySQL.

View all my tips



Comments For This Article





download





Recommended Reading

Import Data from PDF to SQL Server

Using Simple Linear Regression to Make Predictions

Data Structures in R including Vector, Matrix, Array, List, and Data Frame

sqldf in R Example for SQL Server

Exploring Four Simple Time Series Forecasting Methods with R Examples














get free sql tips
agree to terms