3.9 Pipeline Construction

20210103 Continuing with our pipeline example, we might want a base::summary() of the dataset.

# Summarise subset of variables for observations with rainfall.

ds %>% 
  select(min_temp, max_temp, rainfall, sunshine) %>%
  filter(rainfall >= 1) %>%
  summary()
##     min_temp        max_temp        rainfall          sunshine     
##  Min.   :-8.50   Min.   :-4.10   Min.   :  1.000   Min.   : 0.000  
##  1st Qu.: 8.40   1st Qu.:15.50   1st Qu.:  2.200   1st Qu.: 2.400  
##  Median :12.20   Median :19.30   Median :  4.600   Median : 5.500  
##  Mean   :12.72   Mean   :20.19   Mean   :  9.681   Mean   : 5.379  
##  3rd Qu.:17.20   3rd Qu.:24.40   3rd Qu.: 10.800   3rd Qu.: 8.100  
##  Max.   :28.90   Max.   :46.30   Max.   :474.000   Max.   :14.200  
##  NA's   :127     NA's   :176                       NA's   :20056

It could be useful to contrast this with a base::summary() of those observations where there was little or no rain.

# Summarise observations with little or no rainfall.

ds %>% 
  select(min_temp, max_temp, rainfall, sunshine) %>%
  filter(rainfall < 1) %>%
  summary()
##     min_temp        max_temp       rainfall          sunshine    
##  Min.   :-8.70   Min.   :-2.1   Min.   :0.00000   Min.   : 0.00  
##  1st Qu.: 7.20   1st Qu.:19.1   1st Qu.:0.00000   1st Qu.: 6.20  
##  Median :11.90   Median :23.8   Median :0.00000   Median : 9.30  
##  Mean   :11.97   Mean   :24.3   Mean   :0.05825   Mean   : 8.37  
##  3rd Qu.:16.70   3rd Qu.:29.3   3rd Qu.:0.00000   3rd Qu.:11.00  
##  Max.   :33.90   Max.   :48.9   Max.   :0.90000   Max.   :14.50  
##  NA's   :569     NA's   :612                      NA's   :70700

Any number of functions can be included in a pipeline to achieve the results we desire. In the following chapters we will see many examples and some will string together ten or more functions. Each step along the way is of itself generally easily understandable. The power is in what we can achieve by stringing together many simple steps to produce something more complex.



Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.