14.9 Accuracy and Error Rate
From the two vectors cl_te
and target_te
we can
calculate the overall accuracy of the predictions over the
te
dataset. This will simply be the sum of the number of
times the prediction agrees with the actual class, divided by the size
of the te
st dataset (which is the same as the length of
target_te
).
## [1] 82.99
Here we can see that the model has an overall accuracy of 82.99%. That is a relatively high accuracy for a typical model build.
We can also calculate the overall error rate in a similar fashion. Some Data Scientists prefer to talk in terms of the error rate rather than the accuracy:
## [1] 17.01
Thus our decision tree model has an overall error rate of 17.01%.
Notice also that we have now twice converted a proportion (generally a number between 0 and 1) into a percentage (generally a number between 0 and 100) by multiplying the proportion by 100 and then base::round()ing it to 2 decimal places. We will no doubt want to do this regularly (if we find percentages to be more quickly accessible than proportions). This is thus a candidate for packaging up as a function. To do so we use base::function() and provide it with a single argument—the number we wish to convert to a percentage:
We can now use this as a convenience:
## [1] 82.99
## [1] 17.01
To illustrate the more optimistic measure that we obtain when we apply
our model to the tr
aining dataset we can repeat the above
calculations:
## [1] 83.26
## [1] 16.74
The overall accuracy over the tr
aining dataset is
83.26% compared to the
82.99% accuracy calculated over the
te
dataset. The difference for this small dataset is
small but we do see that the accuracy is higher on the
tr
aining dataset. Similarly the overall error rate is
16.74% on the tr
aining dataset
compared to the te
error rate of
17.01%.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0