R is a programming language and software environment designed with statistical computing in graphics in mind.
In R, information is stored in entities called objects. It’s through these objects that you can perform data manipulations, operations, and analysis in R.
Here, we will introduce the main types of R objects and how you can create each one.
Requirements
- R. If you haven’t installed R yet, you may do so here. We also made a tutorial on how to install R in Ubuntu.
- RStudio (Optional). This tutorial will use R’s IDE, RStudio. You can still this tutorial only using R.
Modes
Before we talk about the types of objects, we first discuss modes. The mode is the classification of a data point or value being stored. Here are the commonly used modes in statistical applications:
- numeric
double: any real number can be stored as a double
integer: if a number doesn’t have a decimal component. Integers can
Still be stored as doubles, though.
- logical
logical data comes in only two values, TRUE and FALSE. These two strings are recognized by R as logical values. SImilar to other programming languages, TRUE corresponds to 1 and FALSE corresponds to 0.
- character
a character value. Alphanumeric characters and most special symbols enclosed by quotation marks (“like this“) will be recognized by R as character data.
Now that we know the modes, we begin the discussion of the different types of objects in R.
(Atomic) Vectors
Atomic vectors, or simply vectors, are one-dimensional objects that possess components of the same mode.
In order to create vectors in R, we use the c() function. The following are examples of how you can create different types of vector. You can execute each line by copying each line and pasting it in the R console.
a <- c(1.1, 2.3, 3.7) # double vector b <- c(1L, 2L, 3L) # integer vector #logical vectors c <- c(TRUE, TRUE, FALSE) #logical vector #character vectors d <- c("1", "a","?") #character vector
Lists
Lists are also one-dimensional. However, each element can be of any type. You can add vectors in or even add a list within the list.
Creating a list is done using the list() function.
#creating a list e <- list(c(1,2,3), 1, "a",TRUE)
The list above contains all modes we have discussed so far and even a vector. By calling the variable e where this list is stored, the output is as follows:
> e [[1]] [1] 1 2 3 [[2]] [1] 1 [[3]] [1] "a" [[4]] [1] TRUE
As you can see, a list simply treats each object input as a single element of the list. With this, it is possible to compile objects of different types and dimensions.
Matrices
Matrices belong to the class of data structures known as arrays. In essence, arrays add dimension to an atomic vector. In the context of a matrix, a vector is given two dimensions: row and column.
For simplicity, we will only discuss matrices in this lesson. Matrices are formed using the matrix() function.
For instance you want to create a matrix corresponding to the following table:
The matrix() function will first assign the vectors values from top to bottom:
Afterwards, it will move to the next column to the right and assign vectors from top to bottom once more:
With this logic, we should form the vector that will be shaped into a matrix by specifying values from top to bottom and then left to right:
f <- matrix( c(2,3,5,1,2,3) ,__ , __) #incomplete: DO NOT execute
Then, we complete the line of code by specifying the number of rows and columns using the arguments, ncol = and nrow =
f <- matrix(c(2,3,5,1,2,3) ,nrow = 3 , ncol = 2) #execute
Calling the variable f where the matrix is stored will output the following:
> f [,1] [,2] [1,] 2 1 [2,] 3 2 [3,] 5 3
Data Frames
An array or matrix is just like an atomic vector in the sense that all of its elements are of the same type.
Data frames, on the other hand, is like a multidimensional list.
A data frame accepts equal-length atomic vectors as outputs. These atomic vectors may differ in modes. This means that it is a more structured variant of a list.
A data frame is created using the data.frame() function. Here is an example data frame:
#execute these lines all at once g <- data.frame( x = c(1,2,3), y = c("a","b","c"), z = c(TRUE, FALSE, TRUE) )
By calling the variable g, the following output will be displayed:
> g x y z 1 1 a TRUE 2 2 b FALSE 3 3 c TRUE
Data frames are the most commonly used data types in statistical analysis since it provides flexibility without compromising data structure.
Conclusion
This wraps up the short introduction to R objects. This intends to familiarize you with how these objects work and how they are different from each other.
Of course, there are complexities that you will still have to learn to fully master these objects. We will discuss them in future lessons.