7.13 kmeans example wine train and predict

UNDER DEVELOPMENT 20211229 Now that we have a dataset in the right form we can train the K-Means model:

ml train kmeans 3 wine.csv

The model is:


We can train and then use the model to predict to place each observation into a cluster:

ml train kmeans 3 wine.csv |
  ml predict kmeans wine.csv |
  mlr --csv cut -f label > wine.pr

Now compare the clusters with the wine classes:

cat wine.data |
  cut -d"," -f 1 |
  awk 'NR==1{print "class"} {print}' |
  paste -d"," - wine.pr |
  sort |
  uniq -c

This give us a pairwise count of the wine class and the clustering. We can see it’s not a great match but there is some semblance of overlap. The first column is a frequency count, and then we have the class and label separated by a comma. In this example the cluster labelled 0 covers much of the wine classes 2 and 3, whilst the cluster labelled 2 covers most of the wine class 1. There is then various “noise.”

 11 1,0
  6 1,1
 42 1,2
 69 2,0
  2 2,2
 48 3,0

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0