3.11 Pipeline Identity Operator
REVIEW A handy trick when building a pipeline is to
use what is effectively the identity operator. An identity operator
simply takes the data being communicated through the pipeline, and
without changing it passes it on to the next operator in the
pipeline. An effective identity operator is constructed with the
syntax {.}
. In R terms this is a compound statement
containing just the period whereby the period represents the
data. Effectively this is an operator that passes the data through
without processing it—an identity operator.
Why is this useful? Whilst we are building our pipeline, one line at a time, we will be wanting to put a pipe at the end of each line, but of cause we can not do so if there is no following operator. Also, whilst debugging a pipeline, we may want to execute only a part of it, and so the identity operator is handy there too.
As a typical scenario we might be in the process of building a
pipeline as here and find that including the %>%
at
the end of the line of the dplyr::select() operation:
%>%
ds select(rainfall, min_temp, max_temp, sunshine) %>%
{.}
## # A tibble: 176,747 × 4
## rainfall min_temp max_temp sunshine
## <dbl> <dbl> <dbl> <dbl>
## 1 0.6 13.4 22.9 NA
## 2 0 7.4 25.1 NA
## 3 0 12.9 25.7 NA
## 4 0 9.2 28 NA
## 5 1 17.5 32.3 NA
## 6 0.2 14.6 29.7 NA
## 7 0 14.3 25 NA
## 8 0 7.7 26.7 NA
## 9 0 9.7 31.9 NA
## 10 1.4 13.1 30.1 NA
## # … with 176,737 more rows
We then add the next operation into the pipeline without having to modify any of the code already present:
%>%
ds select(rainfall, min_temp, max_temp, sunshine) %>%
summary() %>%
{.}
## rainfall min_temp max_temp sunshine
## Min. : 0.000 Min. :-8.70 Min. :-4.10 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 7.50 1st Qu.:18.10 1st Qu.: 4.90
## Median : 0.000 Median :12.00 Median :22.80 Median : 8.50
## Mean : 2.241 Mean :12.15 Mean :23.36 Mean : 7.66
## 3rd Qu.: 0.600 3rd Qu.:16.90 3rd Qu.:28.40 3rd Qu.:10.60
## Max. :474.000 Max. :33.90 Max. :48.90 Max. :14.50
## NA's :4318 NA's :2349 NA's :2105 NA's :93859
And so on. Whilst it appears quite a minor convenience, over time as we build more pipelines, this becomes quite a handy trick.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0
