10.3 Add Columns

20200814 Adding columns to a dataset is accomplished with dplyr::mutate(). Within a pipeline dplyr::mutate() will modify the data as it passes through.

In the example below we add two new columns to the dataset. A tee pipe is used to print a sample of the resulting dataset using dplyr::select() and dplyr::sample_frac() within the curly braces. The ongoing pipe assigns the result into a new variable.

ds %>%
  mutate(range_temp=max_temp-min_temp,
         describe_temp=case_when(max_temp > 30 ~ "hot",
                                 max_temp > 20 ~ "mild",
                                 max_temp >  0 ~ "cold",
                                 TRUE          ~ "freezing")) %T>%
  {
    select(., date, location, ends_with("_temp")) %>%
    sample_frac() %>%
    print()
  } ->
newds
## # A tibble: 208,495 × 6
##    date       location     min_temp max_temp range_temp describe_temp
##    <date>     <chr>           <dbl>    <dbl>      <dbl> <chr>        
##  1 2013-07-27 Albury            0.3     13.9      13.6  cold         
##  2 2013-09-16 Wollongong       14.4     17.4       3    cold         
##  3 2021-12-18 WaggaWagga       20.4     38.7      18.3  hot          
##  4 2013-05-15 Cobar             7.5     19.1      11.6  cold         
##  5 2018-12-02 Sydney           17.7     34        16.3  hot          
##  6 2017-10-27 SalmonGums        6.4     21.9      15.5  mild         
##  7 2015-04-12 AliceSprings      9.5     31.2      21.7  hot          
##  8 2013-11-12 GoldCoast        21       25.7       4.7  mild         
##  9 2021-08-10 Sydney            8.8     22.8      14    mild         
## 10 2019-04-26 Wollongong       18.2     26.8       8.6  mild         
## # ℹ 208,485 more rows

To overwrite the original dataset instead of saving it as a new dataset, replace the first pipe with an assignment pipe magrittr::%<>%.



Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0