7.17 kmeans pipelines
20220316
As with all mlhub commands, a goal is to provide powerful combinations of commands through pipelines. We have seen this through the chapter already where we processed a csv file through a number of steps to normalise the columns, to then pipe the csv data into the train command followed by the predict command to output a csv file with each observation labelled with a cluster number. Below is collected sample pipelines that illustrate different data flows.
The output will be similar to the following:
sepal_length,sepal_width,petal_length,petal_width,label
5.0,3.6,1.4,0.2,1
7.7,3.8,6.7,2.2,0
6.1,3.0,4.9,1.8,2
5.4,3.7,1.5,0.2,2
...
To visualise the final clustering, to popup a display of the clustering result:
We can include within the pipeline a normalise operation:
cat wine.csv |
ml normalise kmeans |
tee norm.csv |
ml train kmeans 4 |
ml predict kmeans norm.csv |
mlr --csv cut -f label |
paste -d"," wine.csv - After normalising the input dataset the result is saved to a file
norm.csv using tee whilst piping the same data on
to the next command to train a clustering. We
save to file since we’d like to predict the
clusters for each of the normalised observations, then map them back
to the original observations. This is accomplished using a combination
of mlr to cut the label column from the csv
output from the predict command, and then we
paste that label column to the original wine.csv.
Once we have the resulting model and the predictions made on the original data, we can visualise the result as part of a pipeline, whilst also using tee to save the clustering to file:
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0