14.2 ML Data and Variables

20210104

The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced by Graham J. Williams (2017). See Chapter 8 for details.

dsname <- "weatherAUS"
ds     <- get(dsname)
    
nobs   <- nrow(ds)

vnames <- names(ds)
ds    %<>% clean_names(numerals="right")
names(vnames) <- names(ds)

vars   <- names(ds)
target <- "rain_tomorrow"
vars   <- c(target, vars) %>% unique() %>% rev()

It is always useful to remind ourselves of the dataset with a random sample:

ds  %>% sample_frac() %>% select(date, location, sample(3:length(vars), 5))
## # A tibble: 176,747 × 7
##    date       location     cloud_3pm sunshine evaporation max_temp min_temp
##    <date>     <chr>            <int>    <dbl>       <dbl>    <dbl>    <dbl>
##  1 2017-11-28 Canberra             3     NA          NA       28.4     11.5
##  2 2012-06-26 Woomera              6      1.4         2.6     15.5      4.9
##  3 2015-01-18 Cairns               7      4.1         6.8     34.2     26.2
##  4 2018-06-11 Sale                NA     NA          NA       15.3      2.2
##  5 2009-10-10 Watsonia             4     10.4         3.2     19.3      2.7
##  6 2010-04-12 Richmond            NA     NA           9.2     24.1     10.2
##  7 2008-11-06 Hobart               7      5           5.2     15.2      9.6
##  8 2011-10-21 CoffsHarbour         1     11.8         3.8     23.3     13.2
##  9 2013-07-01 Newcastle            6     NA          NA       17.5     10.6
## 10 2011-01-20 Adelaide            NA     13.5         7       37       17  
## # … with 176,737 more rows

References

Williams, Graham J. 2017. The Essentials of Data Science: Knowledge Discovery Using r. The r Series. CRC Press.


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0