During your data manipulation, exploration, and even analysis, functions can get complicated, putting one function inside of one another in order to accomplish tasks in one swoop. These are called nested functions.

A nested function may look like something like this:

function_1(function_2(function_3(function_4(x))))

Not only can this confuse you while you are coding, this looks quite ugly in the eyes. When you forget to add an operation, you will have to scour a flood of parentheses. What a nightmare.

In R, a better way of dealing with nested functions is the pipe operator, %>%. How is this function used? Let’s boil it down.

Requirements

  • R (>= 3.1.2)
  • dplyr package

Preliminaries

If you haven’t yet, install the magrittr or dplyr package in R by executing either of the two succeeding lines of code.

install.packages(“dplyr”)

 

Afterwards, load the package that you have installed using the appropriate line below:

library(dplyr)

Using the pipe operator

The pipe operator, %>% passes the left-hand side argument as the first argument right-hand side function.

For demonstration, let’s create a data frame, x by executing the following line:

x <- -5:5
x

 

Now, this will generate the following output:

> x
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5

 

What if, just as an example, you have to get the mean of the absolute difference of the data values and the mean?

That’s confusing, let’s decompose it:

  1. We first get the difference of the values of x and the mean of x, in other words, x – mean(x),
  2. We compute the absolute value of the difference above, i.e., |x – mean(x)|
  3. We compute the mean of the absolute differences, mean {|x – mean(x)|}

We can perform this using the following nested function:

mean(abs(x-mean(x)))

[1] 2.727273

Realistically, this isn’t so bad, but when we add more functions in, things will start to get crazy.

Using the pipe operator, we can systematize the flow of the operations in accord to the way  we typically think:

1. We first get the difference of the values of x and the mean of x, in other words, x – mean(x),

(x - mean(x))

2. We compute the absolute value of the difference above, i.e., |x – mean(x)|

(x - mean(x)) %>% abs

3. We compute the mean of the absolute differences, mean {|x – mean(x)|}

(x - mean(x)) %>% abs %>% mean

[1] 2.727273

We see that the results are just the same as the nested function.

Instead of going from the innermost function to the outermost function, the pipe operator shifts the thinking of the operations from left to right — in the sequence that our minds usually operates in.

With the structure of the pipe operator, we can see that we can easily add more operations in case we need to do more.

And that’s about it. The application of the pipe operator is extremely versatile. You can use it not only for vectors as shown here. You can use it for other R objects which you intend to pass in multiple operations in one go.

Previous A Short Primer On Generalized Linear Models (GLM)
Next How To Train Computers Faster For ‘Extreme’ Datasets