14.2 ML Data and Variables

20210104

The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced by Graham J. Williams (2017). See Chapter 8 for details.

dsname <- "weatherAUS"
ds     <- get(dsname)
    
nobs   <- nrow(ds)

vnames <- names(ds)
ds    %<>% clean_names(numerals="right")
names(vnames) <- names(ds)

vars   <- names(ds)
target <- "rain_tomorrow"
vars   <- c(target, vars) %>% unique() %>% rev()

It is always useful to remind ourselves of the dataset with a random sample:

ds  %>% sample_frac() %>% select(date, location, sample(3:length(vars), 5))
## # A tibble: 176,747 x 7
##    date       location    wind_dir_9am rainfall min_temp wind_speed_3pm temp_9am
##    <date>     <chr>       <ord>           <dbl>    <dbl>          <dbl>    <dbl>
##  1 2018-01-29 NorahHead   NNE               0.2     21.7             17     24.5
##  2 2017-11-22 Dartmoor    NNE               0       14.9             11     26.4
##  3 2020-02-16 MelbourneA… SW                3.4     15.2             11     17.1
##  4 2015-11-03 Penrith     SW                0.2     17.7             11     18.2
##  5 2019-07-26 Canberra    NNE               0       -1.8              9      2  
##  6 2010-09-26 Watsonia    NE                0        6.9             19     11.8
##  7 2010-09-21 BadgerysCr… NNW               0.2      6.3             24     16.6
##  8 2010-08-24 Brisbane    W                10.2     14               20     17.9
##  9 2019-07-10 Penrith     NE                0        3.1             11      9.7
## 10 2012-11-07 PerthAirpo… SW                0.2      9.5             26     18.2
## # … with 176,737 more rows

References

Williams, Graham J. 2017. The Essentials of Data Science: Knowledge Discovery Using r. The r Series. CRC Press.


Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.