10.5 Add Counts

20200814 Using dplyr::add_count() a new column will be added to the dataset recording the size of groups. The column name will be n.

ds %<>%
  add_count(location) %T>%
  {
    select(., date, location, n) %>%
    sample_frac() %>%
    print()
  }

## # A tibble: 226,868 × 3
##    date       location             n
##    <date>     <chr>            <int>
##  1 2009-02-17 Tuggeranong       4739
##  2 2017-10-18 SalmonGums        4662
##  3 2014-10-21 PearceRAAF        4708
##  4 2021-02-06 BadgerysCreek     4690
##  5 2009-04-06 Witchcliffe       4583
##  6 2023-03-05 NorahHead         4704
##  7 2012-03-09 MelbourneAirport  4709
##  8 2021-05-21 Launceston        4740
##  9 2019-12-23 WaggaWagga        4709
## 10 2022-07-18 Townsville        4740
## # ℹ 226,858 more rows

names(ds)

##  [1] "date"            "location"        "min_temp"        "max_temp"       
##  [5] "rainfall"        "evaporation"     "sunshine"        "wind_gust_dir"  
##  [9] "wind_gust_speed" "wind_dir_9am"    "wind_dir_3pm"    "wind_speed_9am" 
## [13] "wind_speed_3pm"  "humidity_9am"    "humidity_3pm"    "pressure_9am"   
## [17] "pressure_3pm"    "cloud_9am"       "cloud_3pm"       "temp_9am"       
## [21] "temp_3pm"        "rain_today"      "risk_mm"         "rain_tomorrow"  
## [25] "n"

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0