19.1 Clustering Setup

THIS SECTION IS UNDER DEVELOPMENT. PLEASE CHECK BACK LATER

20200902 The R packages used in this chapter include biclust (Kaiser et al. 2023).

# Load required packages from local library into the R session.

library(biclust)      # Bicluster analysis.
library(dplyr)        # Wrangling: glimpse() group_by() print() select() mutate().
library(rattle)       # Weather dataset.

The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced by Graham J. Williams (2017). See Chapter 8 for details.

dsname <- "weatherAUS"
ds     <- get(dsname)
    
nobs   <- nrow(ds)

vnames <- names(ds)
ds    %<>% clean_names(numerals="right")
names(vnames) <- names(ds)

vars   <- names(ds)
target <- "rain_tomorrow"
vars   <- c(target, vars) %>% unique() %>% rev()

It is always useful to remind ourselves of the dataset with a random sample:

ds  %>% sample_frac() %>% select(date, location, sample(3:length(vars), 5))
## # A tibble: 208,495 × 7
##    date       location     wind_speed_9am temp_9am wind_gust_dir rain_today
##    <date>     <chr>                 <dbl>    <dbl> <ord>         <fct>     
##  1 2012-01-12 Newcastle                19     18.8 <NA>          No        
##  2 2020-12-17 Penrith                   6     23.8 N             Yes       
##  3 2010-04-01 Brisbane                  7     23.6 ESE           Yes       
##  4 2014-08-27 AliceSprings              0     15.2 E             No        
##  5 2012-11-23 MountGambier             13     15.3 SW            No        
##  6 2011-03-05 Launceston                7      8.8 SE            No        
....

References

Kaiser, Sebastian, Rodrigo Santamaria, Tatsiana Khamiakova, Martin Sill, Roberto Theron, Luis Quintales, Friedrich Leisch, Ewoud De Troyer, and Sami Leon. 2023. Biclust: BiCluster Algorithms.
Williams, Graham J. 2017. The Essentials of Data Science: Knowledge Discovery Using r. The r Series. CRC Press.


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0