3.11 Pipeline Identity Operator

REVIEW A handy trick when building a pipeline is to use what is effectively the identity operator. An identity operator simply takes the data being communicated through the pipeline, and without changing it passes it on to the next operator in the pipeline. An effective identity operator is constructed with the syntax {.}. In R terms this is a compound statement containing just the period whereby the period represents the data. Effectively this is an operator that passes the data through without processing it—an identity operator.

Why is this useful? Whilst we are building our pipeline, one line at a time, we will be wanting to put a pipe at the end of each line, but of cause we can not do so if there is no following operator. Also, whilst debugging a pipeline, we may want to execute only a part of it, and so the identity operator is handy there too.

As a typical scenario we might be in the process of building a pipeline as here and find that including the %>% at the end of the line of the dplyr::select() operation:

ds %>%
  select(rainfall, min_temp, max_temp, sunshine) %>%
  {.}

## # A tibble: 226,868 × 4
##    rainfall min_temp max_temp sunshine
##       <dbl>    <dbl>    <dbl>    <dbl>
##  1      0.6     13.4     22.9       NA
##  2      0        7.4     25.1       NA
##  3      0       12.9     25.7       NA
##  4      0        9.2     28         NA
##  5      1       17.5     32.3       NA
##  6      0.2     14.6     29.7       NA
##  7      0       14.3     25         NA
##  8      0        7.7     26.7       NA
##  9      0        9.7     31.9       NA
## 10      1.4     13.1     30.1       NA
## # ℹ 226,858 more rows

We then add the next operation into the pipeline without having to modify any of the code already present:

ds %>%
  select(rainfall, min_temp, max_temp, sunshine) %>%
  summary() %>%
  {.}

##     rainfall          min_temp        max_temp        sunshine     
##  Min.   :  0.000   Min.   :-8.70   Min.   :-4.10   Min.   : 0.00   
##  1st Qu.:  0.000   1st Qu.: 7.50   1st Qu.:17.90   1st Qu.: 4.90   
##  Median :  0.000   Median :11.90   Median :22.60   Median : 8.50   
##  Mean   :  2.348   Mean   :12.09   Mean   :23.21   Mean   : 7.63   
##  3rd Qu.:  0.600   3rd Qu.:16.80   3rd Qu.:28.20   3rd Qu.:10.60   
##  Max.   :474.000   Max.   :33.90   Max.   :48.90   Max.   :14.50   
##  NA's   :6775      NA's   :3800    NA's   :3630    NA's   :132637

And so on. Whilst it appears quite a minor convenience, over time as we build more pipelines, this becomes quite a handy trick.

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0