14.10 Confusion Matrix

The accuracy and error rate are rather blunt measures of performance. They are a good starting point to get a sense of how good the model is but more is required. A confusion matrix allows us to review how well the model performs against the actual classes.

round(100*table(actual_te, predict_te, dnn=c("Actual", "Predicted"))
##       Predicted
## Actual No Yes
##    No  75   3
##    Yes 14   8

This is useful since the consequences of a wrong decision will be different for the different decisions. For example, if it is incorrectly predicted that it will not rain tomorrow and I decide not to carry an umbrella with me then the consequence is that I will get wet. We might experience this as a more severe consequence than the situation where it is incorrectly predicted that it will rain and so we unnecessarily carry an umbrella with us all day.

We again compare this to the performance on the training dataset to note that model performs better, at least when predicting that it will not rain tomorrow.

round(100*table(actual_tr, predict_tr, dnn=c("Actual", "Predicted"))

Notice that the false negative rate (the errors that have a higher consequence—getting uncomfortably wet if it does actually rain) is reduced from 14% to 14%. The performance as measured over the te dataset is likely to be more indicative of the actual model performance and the false negative rate is one that we would rather minimize.

We will be generating confusion matrices quite regularly based on the predicted classes (counting just those that are not missing using Matrix::is.na()) and the target classes. This is another candidate for wrapping up into a function to save us having to remember the command in detail and also to save us having to type out the command each time.

con <- function(predicted, actual)
  tbl <- table(actual, predicted, dnn=c("Actual", "Predicted"))
  tbl <- round(100*tbl/sum(!is.na(predicted)))

We can simply call this function with the appropriate arguments to have the confusion matrix printed.

con(predict_tr, actual_tr)
##       Predicted
## Actual No Yes
##    No  76   3
##    Yes 14   8

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0