3.12 Pipeline Syntactic Sugar
20210103 For the technically minded we note that what is actually happening here is that the syntax (i.e., how we write the sentences) is changed by R from the traditional functional expression, increasing the ease with which we can read the code. This is important as we keep in mind that we write our code for others (and ourselves later on) to read.
The pipeline below combines a series of commands to operate on the dataset.
# Summarise observations with little or no rainfall.
ds %>%
select(min_temp, max_temp, rainfall, sunshine) %>%
filter(rainfall < 1) %>%
summary()
## min_temp max_temp rainfall sunshine
## Min. :-8.70 Min. :-2.10 Min. :0.00000 Min. : 0.00
## 1st Qu.: 7.20 1st Qu.:19.00 1st Qu.:0.00000 1st Qu.: 6.10
## Median :11.80 Median :23.70 Median :0.00000 Median : 9.30
## Mean :11.91 Mean :24.18 Mean :0.05973 Mean : 8.35
## 3rd Qu.:16.60 3rd Qu.:29.10 3rd Qu.:0.00000 3rd Qu.:11.00
## Max. :33.90 Max. :48.90 Max. :0.90000 Max. :14.50
....
Contrast this with how it is mapped by R into the functional construct below, which is how we might have traditionally written it. For many of us it will take quite a bit of effort to parse this traditional functional form of the expression, and so to understand what it is doing. The pipeline alternative above provides a clearer narrative.
# Functional form equivalent to the pipeline above.
summary(filter(select(ds,
min_temp, max_temp, rainfall, sunshine),
rainfall < 1))
## min_temp max_temp rainfall sunshine
## Min. :-8.70 Min. :-2.10 Min. :0.00000 Min. : 0.00
## 1st Qu.: 7.20 1st Qu.:19.00 1st Qu.:0.00000 1st Qu.: 6.10
## Median :11.80 Median :23.70 Median :0.00000 Median : 9.30
## Mean :11.91 Mean :24.18 Mean :0.05973 Mean : 8.35
## 3rd Qu.:16.60 3rd Qu.:29.10 3rd Qu.:0.00000 3rd Qu.:11.00
## Max. :33.90 Max. :48.90 Max. :0.90000 Max. :14.50
....
Anything that improves the readability of our code is useful. Computers are quite capable of doing the hard work of transforming a simpler sentence into this much more complex looking sentence for its own purposes. For our purposes, let’s keep it simple for others to follow.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0