7.14 kmeans example wine dataset cluster
Now that we have a dataset in the right form we can train the k-means model. Weβll start with the original dataset before it was normalised.
We can train and then use the model to predict and so doing extract the prediction for each observation into a file of just the cluster membership.
The original dataset has three classes so we might compare the clusters with the wine classes:
cat wine.data |
cut -d"," -f 1 |
awk 'NR==1{print "class"} {print}' |
paste -d"," - wine.pr |
sort |
uniq -c
The pairwise count of the wine class and the clustering are
reported. There is reasonable overlap. In this example the cluster
labelled 0
covers much of the wine classes 2 and 3, whilst the
cluster labelled 2 covers most of the wine class 1. There is then
various βnoiseβ.
11 1,0
6 1,1
42 1,2
69 2,0
2 2,2
48 3,0
Finally, we can visualise the multiple dimensional clustering using principle components:
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0