Last time, we discussed how to index or subset vectors and matrices in R. Now, we will deal with indexing the other commonly used R objects: lists and data frames.
Typically, we will not be dealing with data with the level of simplicity of vectors and matrices. Most of the time, more structure with the information we collect. With this, most objects that you will encounter will actually come in the form of lists and data frames. Data frames are highly used especially in the context of statistical analysis.
If you are not yet familiar with R objects, you may check the introduction here.
Requirements
- R. If you haven’t installed R yet, you may do so here. We also made a tutorial on how to install R in Ubuntu.
- RStudio (Optional). This tutorial will use R’s IDE, RStudio. You can still this tutorial only using R.
Lists
To illustrate subsetting in lists, let’s first generate a list by running the following code:
a <- list(names = c(“Mary”,”Lucas”,”June”), items = c(“Pen”, “Paper”, “Stone”, “Scissors”), numbers = c(1:6))
Calling the variable, a, shows the output below:
> a $names [1] "Mary" "Lucas" "June" $items [1] "Pen" "Paper" "Stone" "Scissors" $numbers [1] 1 2 3 4 5 6
There are two levels of subsetting in lists:
- element-wise
- within elements
Lists: Element -wise subsetting
The elements could be vectors, numbers, data frames, matrices, and others. Suppose you want to extract just the collection of items in the list, a.
The collection of items is the second element in the list. You can extract it using the bracket operator as you would in vectors:
> a[2] $items [1] "Pen" "Paper" "Stone" "Scissors"
You can also do the same thing using the dollar ($) operator, now specifying the name of the list element as shown below:
> a$items $items [1] "Pen" "Paper" "Stone" "Scissors"
Lists: Subsetting within elements
If instead, you want to extract the third item in the item list, you have to create another layer of subsetting. This can be done using both the bracket and dollar operator as shown below:
> a[[2]][3] $items [1] "Stone" > a$items[3] $items [1] "Stone"
Using the bracket operator, you have to specify which element you will be extracting from by enclosing the index with double brackets ( [[ ]] ). Then, you have to specify the index of the element you are trying to extract with a bracket operator as you would in regular vectors.
Using the dollar operator, you instead have to specify the name of the element and layer it with a single bracket operator enclosing the index (or indices) of the sub-elements that you are trying to extract.
Data Frames
Data frames are extracted pretty much the same way as lists, though there are slight differences. Let’s create a data frame and see for ourselves:
b <- data.frame( A = 1:4, B = 5:8, row.names = c("Mary", "Lucas", "Mattie", "June")) > b A B Mary 1 5 Lucas 2 6 Mattie 3 7 June 4 8
Now, let’s look at how subsetting occurs in data frames.
Data Frames: Element -wise subsetting
You can use either the bracket operator or dollar operator to get the vector you desire. For instance, if we want to extract the second vector of the data frame:
> b$B [1] 5 6 7 8 #if you want to extract the column retaining the names > b[2] B Mary 5 Lucas 6 Mattie 7 June 8 #if you want to extract the column as a vector > b[[2]] [1] 5 6 7 8
Data Frames: Subsetting within elements
If you want to get the fourth element in the second column, there are multiple ways to do so using either the bracket or dollar operator:
> b$B[4] [1] 8 #treating the data as if it was a matrix: b[4,2] [1] 8 #treating the data as if it was a vector > b[[2]][4] [1] 8
Since data frames are like structured matrices, you can use matrix indices to subset within an element. You can also treat them as vectors and use a bracket operator to extract a particular element.
Conclusion
This wraps up our tutorial on how to subset or index commonly used R objects. You can explore R yourself so that you can identify what subsetting methods and what kind of objects you are most comfortable with.