3.9 Pipeline

20210103 Suppose we want to produce a base::summary() of a selection of numeric variables. We can pipe the output of dplyr::select() into base::summary().

# Select variables from the dataset and summarise the result.

ds %>% 
  select(min_temp, max_temp, rainfall, sunshine) %>%
  summary()
##     min_temp        max_temp        rainfall          sunshine     
##  Min.   :-8.70   Min.   :-4.10   Min.   :  0.000   Min.   : 0.00   
##  1st Qu.: 7.50   1st Qu.:17.90   1st Qu.:  0.000   1st Qu.: 4.90   
##  Median :11.90   Median :22.60   Median :  0.000   Median : 8.50   
##  Mean   :12.09   Mean   :23.21   Mean   :  2.348   Mean   : 7.63   
##  3rd Qu.:16.80   3rd Qu.:28.20   3rd Qu.:  0.600   3rd Qu.:10.60   
##  Max.   :33.90   Max.   :48.90   Max.   :474.000   Max.   :14.50   
....

Perhaps we would like to review only those observations where there is more than a little rain on the day of the observation. To do so we stats::filter() the observations.

# Select specific variables and observations from the dataset.

ds %>% 
  select(min_temp, max_temp, rainfall, sunshine) %>%
  filter(rainfall >= 1)
## # A tibble: 51,598 × 4
##    min_temp max_temp rainfall sunshine
##       <dbl>    <dbl>    <dbl>    <dbl>
##  1     17.5     32.3      1         NA
##  2     13.1     30.1      1.4       NA
##  3     15.9     21.7      2.2       NA
##  4     15.9     18.6     15.6       NA
##  5     12.6     21        3.6       NA
##  6     13.5     22.9     16.8       NA
##  7     11.2     22.5     10.6       NA
##  8     12.5     24.2      1.2       NA
##  9     18.8     35.2      6.4       NA
## 10     14.6     29        3         NA
## # ℹ 51,588 more rows

This sequence of functions operating on the original rattle::weatherAUS dataset returns a subset of that dataset where all observations have at least 1mm of rain.



Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0