10.3 Add Columns
20200814 Adding columns to a dataset is accomplished with dplyr::mutate(). Within a pipeline dplyr::mutate() will modify the data as it passes through.
In the example below we add two new columns to the dataset. A tee pipe is used to print a sample of the resulting dataset using dplyr::select() and dplyr::sample_frac() within the curly braces. The ongoing pipe assigns the result into a new variable.
ds %>%
mutate(range_temp=max_temp-min_temp,
describe_temp=case_when(max_temp > 30 ~ "hot",
max_temp > 20 ~ "mild",
max_temp > 0 ~ "cold",
TRUE ~ "freezing")) %T>%
{
select(., date, location, ends_with("_temp")) %>%
sample_frac() %>%
print()
} ->
newds
## # A tibble: 208,495 × 6
## date location min_temp max_temp range_temp describe_temp
## <date> <chr> <dbl> <dbl> <dbl> <chr>
## 1 2013-07-27 Albury 0.3 13.9 13.6 cold
## 2 2013-09-16 Wollongong 14.4 17.4 3 cold
## 3 2021-12-18 WaggaWagga 20.4 38.7 18.3 hot
## 4 2013-05-15 Cobar 7.5 19.1 11.6 cold
## 5 2018-12-02 Sydney 17.7 34 16.3 hot
## 6 2017-10-27 SalmonGums 6.4 21.9 15.5 mild
## 7 2015-04-12 AliceSprings 9.5 31.2 21.7 hot
## 8 2013-11-12 GoldCoast 21 25.7 4.7 mild
## 9 2021-08-10 Sydney 8.8 22.8 14 mild
## 10 2019-04-26 Wollongong 18.2 26.8 8.6 mild
## # ℹ 208,485 more rows
To overwrite the original dataset instead of saving it as a new dataset, replace the first pipe with an assignment pipe magrittr::%<>%.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0