10 Rain Prediction
How do we go about representing knowledge in AI? When we build a model using machine learning we need to have a larget language that allows us to represent that model. A quite simple to visualise target language is a decision tree. Decision trees are covered in quite some detail in the Data Science Desktop Survival Guide.
Decision trees and the ensemble of decision trees within a random forest are both very common approaches to building classification type models in AI. The concept of an ensemble of decision trees was introduced in my 1988 paper Combining Decision Trees: Initial results from the MIL algorithm where the improved performance from multiple trees is demonstrated. The rattle package in R provides the weatherAUS dataset which is used to predict if it will rain tomorrow (or any other target variable of choice).
The package provides a predictive model for the probability of it raining tomorrow based on today’s weather observations. The training dataset consists of daily weather observations from weather stations across Australia capturing the amount of sunshine, the humidity, the amount of rain today, etc. This simplest of approaches uses the decision tree induction algorithm to build a model that captures knowledge in the form of a decision tree. Other (often more accurate but more complex) models include the random forest which builds a forest (that is, a collection) of decision trees and produces an ensemble model. Ensembles have been shown over many years to produce more accurate models.
The example model and code come from my Essentials of Data Science.
We install, configure and demonstrate the model with these three commands:
ml install rain ml configure rain ml readme rain ml commands rain ml demo rain
In addition to the demo command, the package supports the following commands: predict.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0