14.2 ML Data and Variables

20210104

The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced by Graham J. Williams (2017). See Chapter 8 for details.

dsname <- "weatherAUS"
ds     <- get(dsname)
    
nobs   <- nrow(ds)

vnames <- names(ds)
ds    %<>% clean_names(numerals="right")
names(vnames) <- names(ds)

vars   <- names(ds)
target <- "rain_tomorrow"
vars   <- c(target, vars) %>% unique() %>% rev()

It is always useful to remind ourselves of the dataset with a random sample:

ds  %>% sample_frac() %>% select(date, location, sample(3:length(vars), 5))
## # A tibble: 198,656 × 7
##    date       location  wind_dir_3pm min_temp cloud_3pm rain_tomorrow rain_today
##    <date>     <chr>     <ord>           <dbl>     <int> <fct>         <fct>     
##  1 2017-09-05 Hobart    NW                2.8         3 No            Yes       
##  2 2010-08-06 Melbourn… SW                5.9         6 No            Yes       
##  3 2011-09-04 Sale      NW                8.2         7 No            No        
##  4 2021-03-22 PearceRA… SW               15.6         8 No            No        
##  5 2019-05-18 Townsvil… E                19          NA No            No        
##  6 2009-11-03 Brisbane  NE               14.9         0 No            No        
##  7 2019-12-23 Adelaide  WSW              18.4        NA No            No        
##  8 2015-09-07 Bendigo   WSW               6.2         8 Yes           Yes       
##  9 2020-09-03 Cobar     NW               13.2        NA Yes           No        
## 10 2019-11-03 Tuggeran… E                16.1        NA Yes           No        
## # … with 198,646 more rows

References

Williams, Graham J. 2017. The Essentials of Data Science: Knowledge Discovery Using r. The r Series. CRC Press.


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0