3.10 Pipeline Construction

20210103 Continuing with our pipeline example, we might want a base::summary() of the dataset.

# Summarise subset of variables for observations with rainfall.

ds %>% 
  select(min_temp, max_temp, rainfall, sunshine) %>%
  filter(rainfall >= 1) %>%
  summary()

##     min_temp        max_temp        rainfall          sunshine     
##  Min.   :-8.50   Min.   :-4.10   Min.   :  1.000   Min.   : 0.000  
##  1st Qu.: 8.30   1st Qu.:15.50   1st Qu.:  2.200   1st Qu.: 2.400  
##  Median :12.20   Median :19.30   Median :  4.800   Median : 5.400  
##  Mean   :12.69   Mean   :20.15   Mean   :  9.756   Mean   : 5.352  
##  3rd Qu.:17.10   3rd Qu.:24.40   3rd Qu.: 11.000   3rd Qu.: 8.100  
##  Max.   :28.90   Max.   :46.30   Max.   :474.000   Max.   :14.200  
....

It could be useful to contrast this with a base::summary() of those observations where there was little or no rain.

# Summarise observations with little or no rainfall.

ds %>% 
  select(min_temp, max_temp, rainfall, sunshine) %>%
  filter(rainfall < 1) %>%
  summary()

##     min_temp       max_temp        rainfall          sunshine     
##  Min.   :-8.7   Min.   :-2.10   Min.   :0.00000   Min.   : 0.000  
##  1st Qu.: 7.2   1st Qu.:19.00   1st Qu.:0.00000   1st Qu.: 6.200  
##  Median :11.8   Median :23.70   Median :0.00000   Median : 9.300  
##  Mean   :11.9   Mean   :24.21   Mean   :0.05903   Mean   : 8.357  
##  3rd Qu.:16.6   3rd Qu.:29.20   3rd Qu.:0.00000   3rd Qu.:11.000  
##  Max.   :33.9   Max.   :48.90   Max.   :0.90000   Max.   :14.500  
....

Any number of functions can be included in a pipeline to achieve the results we desire. In the following chapters we will see many examples and some will string together ten or more functions. Each step along the way is of itself generally easily understandable. The power is in what we can achieve by stringing together many simple steps to produce something more complex.

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0