7.6 kmeans train

20211016

ml train kmeans [options] <k> [csvfile]
     -o <file.csv> --output=<file.csv>     Save the model to csv or movie to mp4 file.
                   --view                  Popup a movie of the clustering.

With no input csvfile the data is read from standard input which allows the command to be part of a pipeline of commands, whereby the training data (in csv format containing a header) could be piped from another operation.

The default output is a csv of the centres, with a cluster label pre-pended, and a header row with the cluster label column named label.

If -o or --output= is provided then the filename extension is used to determine the type of the output. This is either the model as a csv file or the movie as an mp4 file. Both can be specified in separate -o arguments.

ml train kmeans 3 test.csv -o centers.csv -o test.mp4

If no csv output is specified then the output is always to the terminal, irrespective of whether a mp4 is also output or whether --view is requested.

The output might look something like:

$ ml train kmeans 3 iris.csv
sepal_length,sepal_width,petal_length,petal_width,label
6.85,3.07,5.71,2.05,0
5.00,3.42,1.46,0.24,1
5.88,2.74,4.38,1.43,2


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.