19.1 Clustering Setup

20200902 Packages used in this chapter include biclust (Kaiser et al. 2021).

# Load required packages from local library into the R session.

library(biclust)      # Bicluster analysis.
library(dplyr)        # Wrangling: glimpse() group_by() print() select() mutate().
library(rattle)       # Weather dataset.

The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced by Graham J. Williams (2017). See Chapter 8 for details.

dsname <- "weatherAUS"
ds     <- get(dsname)
    
nobs   <- nrow(ds)

vnames <- names(ds)
ds    %<>% clean_names(numerals="right")
names(vnames) <- names(ds)

vars   <- names(ds)
target <- "rain_tomorrow"
vars   <- c(target, vars) %>% unique() %>% rev()

It is always useful to remind ourselves of the dataset with a random sample:

ds  %>% sample_frac() %>% select(date, location, sample(3:length(vars), 5))
## # A tibble: 176,747 x 7
##    date       location  risk_mm pressure_3pm pressure_9am rainfall wind_gust_dir
##    <date>     <chr>       <dbl>        <dbl>        <dbl>    <dbl> <ord>        
##  1 2013-06-02 Penrith      13.4          NA           NA       8.4 SSW          
##  2 2016-01-10 Badgerys…     0          1016.        1020.      0   ESE          
##  3 2012-04-09 Nuriootpa     0          1027.        1027.      1   SW           
##  4 2019-11-16 Cobar         0          1014         1017       0   SW           
##  5 2008-10-02 Melbourne     0          1011.        1014.      0   N            
##  6 2013-05-20 Wollongo…     0          1014.        1018.      0   WSW          
##  7 2014-07-07 SalmonGu…    14            NA           NA       1.4 NW           
##  8 2011-11-11 Bendigo       0          1020.        1024.      0   WNW          
##  9 2019-01-11 Woomera       0          1008.        1011.      0   N            
## 10 2020-02-01 Tuggeran…     0          1008.        1011.      0   W            
## # … with 176,737 more rows

References

Kaiser, Sebastian, Rodrigo Santamaria, Tatsiana Khamiakova, Martin Sill, Roberto Theron, Luis Quintales, Friedrich Leisch, Ewoud De Troyer, and Sami Leon. 2021. Biclust: BiCluster Algorithms. https://CRAN.R-project.org/package=biclust.
Williams, Graham J. 2017. The Essentials of Data Science: Knowledge Discovery Using r. The r Series. CRC Press.


Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.