19.1 CSV Basics

20220112

A verb is required to let mlr know what action to undertake. There are many, but some common verbs include

verb action
cat output the supplied input
cut -f name,pid retain selected columns
filter ‘\(id == "u1234567"'|retain rows that meet some conditions| |put '\)mark = round($total)’ construct new variables
rename ‘Patient ID’,pid rename variables
sort -f mark sort rows on a particular field
then to construct a pipeline

To get the column names for the csv file:

$ mlr cat example.csv | head -1
1=color,2=shape,3=flag,4=index,5=quantity,6=rate

A verb is required to specify an action. Here we simply cat the file:

$ mlr cat example.csv
1=color,2=shape,3=flag,4=index,5=quantity,6=rate
1=yellow,2=triangle,3=1,4=11,5=43.6498,6=9.8870
1=red,2=square,3=1,4=15,5=79.2778,6=0.0130
1=red,2=circle,3=1,4=16,5=13.8103,6=2.9010
1=red,2=square,3=0,4=48,5=77.5542,6=7.4670
1=purple,2=triangle,3=0,4=51,5=81.2290,6=8.5910
1=red,2=square,3=0,4=64,5=77.1991,6=9.5310
1=purple,2=triangle,3=0,4=65,5=80.1405,6=5.8240
1=yellow,2=circle,3=1,4=73,5=63.9785,6=4.2370
1=yellow,2=circle,3=1,4=87,5=63.5058,6=8.3350
1=purple,2=square,3=0,4=91,5=72.3735,6=8.2430

Specify that the input is a csv file using --icsv:

$ mlr --icsv cat example.csv
color=yellow,shape=triangle,flag=1,index=11,quantity=43.6498,rate=9.8870
color=red,shape=square,flag=1,index=15,quantity=79.2778,rate=0.0130
color=red,shape=circle,flag=1,index=16,quantity=13.8103,rate=2.9010
color=red,shape=square,flag=0,index=48,quantity=77.5542,rate=7.4670
color=purple,shape=triangle,flag=0,index=51,quantity=81.2290,rate=8.5910
color=red,shape=square,flag=0,index=64,quantity=77.1991,rate=9.5310
color=purple,shape=triangle,flag=0,index=65,quantity=80.1405,rate=5.8240
color=yellow,shape=circle,flag=1,index=73,quantity=63.9785,rate=4.2370
color=yellow,shape=circle,flag=1,index=87,quantity=63.5058,rate=8.3350
color=purple,shape=square,flag=0,index=91,quantity=72.3735,rate=8.2430

And that both input and output are csv with --csv:

$ mlr --csv cat example.csv
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
red,square,1,15,79.2778,0.0130
red,circle,1,16,13.8103,2.9010
red,square,0,48,77.5542,7.4670
purple,triangle,0,51,81.2290,8.5910
red,square,0,64,77.1991,9.5310
purple,triangle,0,65,80.1405,5.8240
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0