15.1 Machine Learning Setup

20200514 Packages used in this chapter include magrittr (Bache and Wickham 2022), and rattle (G. Williams 2024).

Packages are loaded into the currently running R session from your local library directories on disk. Missing packages can be installed using utils::install.packages() within R. On Ubuntu, for example, R packages can also be installed using $ wajig install r-cran-<pkgname>.

# Load required packages from local library into the R session.

library(magrittr)     # Data pipelines: %>% %<>% %T>% equals().
library(rattle)       # Dataset: weather.

The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced by Graham J. Williams (2017). See Chapter 8 for details.

dsname <- "weatherAUS"
ds     <- get(dsname)
    
nobs   <- nrow(ds)

vnames <- names(ds)
ds    %<>% clean_names(numerals="right")
names(vnames) <- names(ds)

vars   <- names(ds)
target <- "rain_tomorrow"
vars   <- c(target, vars) %>% unique() %>% rev()

The variable form is used in this chapter as the formula describing the model to be built.

form
## rain_tomorrow ~ .
ds  %>% sample_frac()
## # A tibble: 226,868 × 24
##    date       location   min_temp max_temp rainfall evaporation sunshine
##    <date>     <chr>         <dbl>    <dbl>    <dbl>       <dbl>    <dbl>
##  1 2018-03-16 Portland        7.8     23.6      0.8        NA       NA  
##  2 2021-09-13 Canberra       -1       11.6      0.6        NA       NA  
##  3 2013-01-15 NorahHead      17.4     26.3      1          NA       NA  
##  4 2010-12-31 Portland       13.5     34.4      0           6.6     13  
##  5 2020-04-06 Ballarat        6.9     12.5      5.4        NA       NA  
##  6 2014-09-22 Portland        3.7     22.4      0.2         2.6     10.3
##  7 2023-02-01 PearceRAAF     18.5     34        0          NA       12.7
##  8 2011-10-23 SalmonGums     13.8     15.4      2.4        NA       NA  
##  9 2009-01-24 Moree          23.2     35.5      0           5.2     10.9
## 10 2018-07-18 Canberra        0.2     12.8      0          NA       NA  
## # ℹ 226,858 more rows
## # ℹ 17 more variables: wind_gust_dir <ord>, wind_gust_speed <dbl>,
## #   wind_dir_9am <ord>, wind_dir_3pm <ord>, wind_speed_9am <dbl>,
## #   wind_speed_3pm <dbl>, humidity_9am <int>, humidity_3pm <int>,
## #   pressure_9am <dbl>, pressure_3pm <dbl>, cloud_9am <int>, cloud_3pm <int>,
## #   temp_9am <dbl>, temp_3pm <dbl>, rain_today <fct>, risk_mm <dbl>,
## #   rain_tomorrow <fct>

References

Bache, Stefan Milton, and Hadley Wickham. 2022. Magrittr: A Forward-Pipe Operator for r. https://magrittr.tidyverse.org.
Williams, Graham. 2024. Rattle: Graphical User Interface for Data Science in r. https://rattle.togaware.com/.
Williams, Graham J. 2017. The Essentials of Data Science: Knowledge Discovery Using r. The r Series. CRC Press.


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0