10.25 Reviewing Variable Names
The names of the variables within the dataset as supplied to us may
not be in any particular form and may use different conventions. For
example, they may use a mix of upper and lower case letters
TempToday9AM) or be very long
Temperature_Recorded_Today_9am) or use sequential numbers
to identify each variable (
or use codes (
XVn34_rain) or any number of other
conventions. Often we prefer to simplify the variable names to ease
our processing and thinking and to enforce a standard and consistent
naming convention for ourselves.
We use base::names() to list the names of the variables within a dataset.
# Review the variables to consider normalising their names. names(ds)
##  "date" "location" "min_temp" "max_temp" ##  "rainfall" "evaporation" "sunshine" "wind_gust_dir" ##  "wind_gust_speed" "wind_dir_9am" "wind_dir_3pm" "wind_speed_9am" ##  "wind_speed_3pm" "humidity_9am" "humidity_3pm" "pressure_9am" ##  "pressure_3pm" "cloud_9am" "cloud_3pm" "temp_9am" ##  "temp_3pm" "rain_today" "risk_mm" "rain_tomorrow"
Notice that the names here use a scheme whereby the initial letter is capitalised and each word within the variable name is also capitalised. That’s a reasonable naming scheme and is preferred by some.
Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.