8.5 apriori train 🚧
ml train apriori [options] [file.csv] -c <0-1> --confidence=<0-1> Minimum confidence threshold. --id=<name> The id column name. -o <file> --output=<file> Save itemsets to .csv or .rds file. -s <0-1> --support=<0-1> Minimum support threshold.
Input file is a two column csv file. One column is the basket
id and the other is an item in that basket. The item column can have
any name. If no data file is supplied the data is read from
often as part of a pipeline. An example input data file might be:
id,item u1234567,comp1234 u1234567,comp2345 u1234567,comp3456 u1234567,comp4567 u1234568,comp1234 u1234568,comp4567 ...
Output to stdout (by default) is a row for each association rule, together with a number of measures of the quality of the rule:
$ ml train apriori mcomp.csv rule,support,confidence,coverage,lift,count comp1234:comp2345=>comp4567,0.6,1.0,0.6,1.4,6 ...
The count for a rule
A=>B is the number of transactions that contain
the items in A and B. The support is the proportion of transactions
that contain A and B (i.e., the count over the total number of
transactions in the dataset). The confidence is the probability of B
being in a transaction whenever A is in then transaction (i.e., the
count over the total number of transactions that contain A). The
coverage is the proportion of transactions that contain A. The lift is
a measure of the correlation between A and B. If it is 1 then there is
no correlation. Above 1 is a positive correlation and below 1 is a
As with itemsets, the output can be saved to a
named csv file using
-o), with the argument being
a filename including the
.csv extension. If the filename extension
.rds then the result is saved as a single object in the
ml train apriori -o rules.csv mcomp.csv ml train apriori -o rules.rds mcomp.csv
Other options are similar to itemsets with the
-c) as a minimum threshold for the
confidence of the rules that we are interested in.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0