10.24 Normalise Variables
20200912 To rename variables in a dataset we can use
dplyr::rename_with() which can apply a function, like
rattle::normVarNames(), to the variable names and replace
those names with the result from the function. A tidy alternative is
to use janitor::clean_names() with the option
numerals=``"right"
to replicate
rattle::normVarNames().
The choice of variable naming style is suggested in
Chapter 27. all variable names are lowercase with words
separated by the underscore. This normalisation is useful when
different upper/lower case conventions are intermixed inconsistently
in names like Incm_tax_PyBl
. Remembering how to capitalise when
interactively exploring the data with thousands of such variables can
be quite a cognitive load. Yet we often see such variable names
arising in practise especially when we import data from databases
which are often case insensitive.
The example below shows the transformation into the preferred normalised form.
## [1] "Date" "Location" "MinTemp" "MaxTemp"
## [5] "Rainfall" "Evaporation" "Sunshine" "WindGustDir"
## [9] "WindGustSpeed" "WindDir9am" "WindDir3pm" "WindSpeed9am"
## [13] "WindSpeed3pm" "Humidity9am" "Humidity3pm" "Pressure9am"
## [17] "Pressure3pm" "Cloud9am" "Cloud3pm" "Temp9am"
## [21] "Temp3pm" "RainToday" "RISK_MM" "RainTomorrow"
## [1] "date" "location" "min_temp" "max_temp"
## [5] "rainfall" "evaporation" "sunshine" "wind_gust_dir"
## [9] "wind_gust_speed" "wind_dir_9am" "wind_dir_3pm" "wind_speed_9am"
## [13] "wind_speed_3pm" "humidity_9am" "humidity_3pm" "pressure_9am"
## [17] "pressure_3pm" "cloud_9am" "cloud_3pm" "temp_9am"
## [21] "temp_3pm" "rain_today" "risk_mm" "rain_tomorrow"
Notice the use of the assignment pipe here as introduced in
Chapter ??}. We will recall that the %>%
operator pipes the left-hand data to the function on the right-hand
side and then returns the result to the left-hand side overwriting the
original contents of the memory referred to on the left-hand side.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0