10.8 rain demo variable selection
When the model was built, the algorithm chooses a variable for each node of the resulting decision tree. An entropy, information theory or gini based calculation is used to choose the variable. The variable with the highest value according to this measure is chosen for the particular node.
Below we will see the calculations that were made for the root node of the tree (Node Number 1). A number of variables were considered and the variable with the top score was chosen for this node. The improve= is the value of the calculation.
Node number 1: 123722 observations, complexity param=0.342 predicted class=no expected loss=0.4 P(node) =1 class counts: 97753 25969 probabilities: 0.600 0.400 left son=2 (92796 obs) right son=3 (30926 obs) Primary splits: humidity_3pm < 64.5 to the left, improve=11510, (0 missing) rainfall < 0.35 to the left, improve= 7486, (0 missing) rain_today splits as LR, improve= 7133, (0 missing) cloud_3pm < 6.5 to the left, improve= 5030, (0 missing) humidity_9am < 73.5 to the left, improve= 4535, (0 missing) Surrogate splits: cloud_3pm < 7.5 to the left, agree=0.778, adj=0.112, (0 split) humidity_9am < 87.5 to the left, agree=0.775, adj=0.100, (0 split) sunshine < 3.25 to the right, agree=0.772, adj=0.088, (0 split) temp_3pm < 12.55 to the right, agree=0.771, adj=0.085, (0 split) rainfall < 4.85 to the left, agree=0.766, adj=0.063, (0 split)
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0