Statistics For Dummies: Indexing and Subsetting In R [Part 1 of 2] : Vectors And Matrices

files-valentino-funghi-MEcxLZ8ENV8-unsplash

Previously, we talked about the objects in R and how they are created. Now, we will discuss how information can be extracted from these objects. This process is also known as indexing or subsetting.

The ability to access information is important since there are times that we only need a specific data point or subset of a collection of measurements for further processing or analysis.

Luckily in R, doing so is fairly systematic. In this lesson, we will go over two of the object types we have discussed — vectors and matrices.

If you’re not yet familiar with these objects in R, you may check the introduction here before proceeding with this tutorial.

Requirements

  • R. If you haven’t installed R yet, you may do so here. We also made a tutorial on how to install R in Ubuntu.
  • RStudio (Optional). This tutorial will use R’s IDE, RStudio. You can still this tutorial only using R.

Vectors

For this illustration, let’s generate a sequence of numbers from 1 to 10 using this line:

a <- 6:15

 

Note that the colon (:) operator increments from the starting value by 1 until the ending value is reached or before it is exceeded.

If you call the variable a, the following output will be shown:

> a
[1]  6 7 8  9 10 11 12 13 14 15

 

As you can see, the 1 was added successively to 6 until the value of 15 is attained.

For instance, if we want to extract the third value of the vector a, we can do so by using the bracket operator as shown in the following line:

a[3]

 

Read More  How To: Perform Tests for Normality in R

This will yield the following output:

> a[3]
[1]  8

 

But what if we want to extract multiple values? For instance, what if we want to extract the third, seventh, and ninth values of the data set?

In this case, we have to add a vector argument inside the bracket operator:

a[c(3, 7, 9)]

 

You can confirm that this line with output the following numbers, which are exactly what we need:

> a[c(3, 7, 9)]
[1]  8 12 14

 

What if instead, you want to exclude a particular value? Let’s say we want to omit the first and last value of the vector.

We can do this by adding a minus sign (-) in the argument we place inside the bracket operator:

> a[-c(1, length(a))]

 

Here we introduce another useful function, the length() function. This simply outputs the length of the vector argument we place in it. This is, of course, way easier than actually counting the number of elements in a vector.

The line above will output the following:

> a[-c(1, length(a))]
[1]  7 8 9 10 11 12 13 14

 

We have successfully removed the first and last data points in the vector.

Matrices

The principles we learned in extracting information or subsetting of vectors are mostly directly transferable to matrices.

For this example, let’s generate a 3×3 matrix by executing the following line:

b <- matrix(c(1:9), nrow = 3, ncol = 3)

 

This will yield the following matrix output:

> b
     [,1] [,2] [,3]
[1,]    1   4   7
[2,]    2   5   8
[3,]    3   6   9

 

In vectors, we just specify one index since we only have one dimension. Naturally, we will specify two indices for matrices. The first index will correspond to the row. Meanwhile, the second index will correspond to the column.

Read More  PathAI Chooses Digital Realty's PlatformDIGITAL To Deploy AI Workloads

Let’s start things simple by extracting just the value in the second row and second column. We can do this by executing the next line:

b[2,2]

 

The following output will be displayed after running the previous line:

> b[2,2]
[1] 5

 

If we want to extract an entire row, we can do so by leaving the column index blank. For example, if we want to extract the first row:

b[2, ]

 

This will output:

> b[2,]
[1] 2 5 8

 

Naturally, we can extract an entire column by doing the opposite. For example, extracting the third column can be done by executing the succeeding line:

b[ ,3]

 

Afterwards, the following output will be displayed:

> b[ ,3]
[1] 7 8 9

 

Multiple rows or columns may be selected as well. This can be performed by using vector arguments. For example, if we want to extract the first and third rows:

> b[c(1,3),]

 

This will yield the following output:

> b[c(1,3),]
[1,]    1   4   7
[2,]    3   6   9

 

This can easily be done with columns as well. You can do that as an exercise.

By using vectors of indices, you can get really specific on the data subset you want to extract.

For instance, if you want to get all the observations in the first and third column, excluding the values in the second row and second column:

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

 

You can execute the following code:

b[c(1,3),c(1,3)]

 

This will yield the desired output:

> b[c(1,3),c(1,3)]
     [,1] [,2]
[1,]    1   7
[2,]    3   9

 

Read More  10 Best Stock Market Datasets For Machine Learning

Using the minus operator as shown below will also yield the same output:

> b[-2,-2]
     [,1] [,2]
[1,]    1   7
[2,]    3   9

Conclusion

In this lesson, we learned to subset vectors and matrices. In the next lesson, we will learn how we can subset lists and data frames. These objects are indexed in a manner similar to vectors and matrices. However, their differing structure will require new kinds of operators.


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Read More

Ubuntu DeepComputing

World’s first RISC-V Laptop gets a massive upgrade and equips with Ubuntu

DeepComputing partners with Canonical to unveil a huge boost to the DC-ROMA RISC-V Laptop family  The DC-ROMA RISC-
Read More