19 Cluster Analysis

THIS CHAPTER IS UNDER DEVELOPMENT. PLEASE CHECK BACK LATER

20211122 Cluster analysis (or clustering) is widely used in data mining to identify groups of similar data. It is well supported in R (R Core Team 2025) with many packages available for preparing for cluster analysis, identifying a good number of clusters, performing a clustering, evaluating the clustering, and assigning new observations to the clustering. A variety of cluster analysis algorithms are available, each generating a cluster label for each observation, as the representation of the clustering. The measure of performance often involves measuring the distances of points withion a cluster and between clusters.

See the MLHub Survival Guide chapter on k-means and the MLHub ready to run package.

References

R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0