7.8 kmeans pipeline

20211015 As with all mlhub commands, a goal is to provide powerful combinations of commands through pipelines. We might process a csv file through a number of steps, for example to normalise the columns, to then pipe the csv file into the train command followed by the predict command to output a csv file with each observation labelled with a cluster number.

$ cat iris.csv | ml train kmeans 3 | ml predict kmeans iris.csv

sepal_length,sepal_width,petal_length,petal_width,label
5.0,3.6,1.4,0.2,1
7.7,3.8,6.7,2.2,0
6.1,3.0,4.9,1.8,2
5.4,3.7,1.5,0.2,2
...


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0