7.7 kmeans predict
20211015 Having performed a cluster analysis we have effectively fit a model to the data and trained a model from the data. The model can now be used to “predict,” or in our case assign, each point to a cluster. The predict command is utilised to label each point in a supplied dataset (also a csv file) based on the “model” saved as a csv file.
ml predict kmeans [options] <csvfile> -m <model.csv> --model=<model.csv> Read model from file or else STDIN. -o <file.csv> --output=<file.csv> Save the output predictions to file.
If no input model file is supplied (
the centres representing the model, and a label, together with a
header row) then it is read from standard input. This allows the
command to be part of a pipeline of commands, whereby the model data
could be piped from the train command. The
cluster label is assumed to be in a column named label and the
remaining columns are the centres.
The output is csv format, with a header, and a column for the label, named as such, as the last column, identifying the nearest centre to each point.
$ ml predict kmeans iris.csv -m model.csv sepal_length,sepal_width,petal_length,petal_width,label 5.1,3.5,1.4,0.2,1 4.9,3.0,1.4,0.2,1 4.7,3.2,1.3,0.2,1 4.6,3.1,1.5,0.2,1 ...
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0