14.5 Model Building
20200607 We now build, fit, or train a model. R has most
machine learning algorithms available. We will begin with a simple
favourite—the decision tree algorithm— using
rpart::rpart(). We record this information using the generic
mdesc (human readable description of the model
mtype (type of the model).
<- "rpart" mtype <- "decision tree"mdesc
The model will be built using tidyselect::all_of()
the dplyr::select()’ed variables from the training
dplyr::slice() of the dataset. The training slice is
identified as the row numbers stored as
tr and the column
names stored as
vars. This training dataset is piped on to
rpart::rpart() together with a specification of the model to
be built as stored in
form. Using generic variables allows
us to change the formula, the dataset, the observations and the
variables used in building the model yet retain the same programming
code. The resulting model is saved into the variable
%>% ds select(all_of(vars)) %>% slice(tr) %>% rpart(form, .) -> model
To view the model simply reference the generic variable
model on the command line. This asks R to
base::print() the model.
## n= 123722 ## ## node), split, n, loss, yval, (yprob) ## * denotes terminal node ## ## 1) root 123722 25921 No (0.7904900 0.2095100) ## 2) humidity_3pm< 71.5 104352 14394 No (0.8620630 0.1379370) * ## 3) humidity_3pm>=71.5 19370 7843 Yes (0.4049045 0.5950955) ## 6) humidity_3pm< 83.5 11433 5331 No (0.5337182 0.4662818) ## 12) wind_gust_speed< 42 6885 2533 No (0.6320988 0.3679012) * ## 13) wind_gust_speed>=42 4548 1750 Yes (0.3847845 0.6152155) * ## 7) humidity_3pm>=83.5 7937 1741 Yes (0.2193524 0.7806476) *
This is our first predictive model. Be sure to spend some time to understand and reflect on the knowledge that the model is exposing.
Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.