8.9 Formula to Describe the Goal
20200607 In the context of supporting analytic modelling tasks we identify formula used to describe the model to be built. Typically we will model the target variable on the input variables, so that using any resulting model with a new set of values for the input variables we can predict the value of the target variable.
Using stats::formula() we can automatically construct the
formula from the dataset itself if the first column of the dataset is
the target variable and the remaining columns are the input
variables. Our usual ordering of columns within a dataset place the
target variable as the last variable rather than the first. A simple
selection of the columns from vars
in the reverse order,
using base::rev(), will then lead to the right formula
automatically.
## rain_tomorrow ~ min_temp + max_temp + rainfall + evaporation +
## sunshine + wind_gust_dir + wind_gust_speed + wind_dir_9am +
## wind_dir_3pm + wind_speed_9am + wind_speed_3pm + humidity_9am +
## humidity_3pm + pressure_9am + pressure_3pm + cloud_9am +
## cloud_3pm + temp_9am + temp_3pm + rain_today
The notation used to express the formula begins with the name of the
target (rain_tomorrow) followed by a
tilde (
) followed by the variables that will be used to model
the target, each separated by a plus (+
). The formula indicates
that we will fit a model to predict
rain_tomorrow from the remaining input
variables.
A shorthand for this same formulation is:
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0