10.26 Rename Variables
20211124 We can rename variables within a
pipeline. To normalise the variables automatically see
Section 10.24. If we have a set of new
names that we want to use instead of the original names or instead of
the automatically normalised names, then use dplyr::rename() in
a pipeline to rename a few by naming the old and new or
dplyr::rename_with() to rename all by position. In the latter
case it is often simpler to name the variables on
readr::read_csv(), for example, using col_names=
.
The original variable names are obtained using base::names():
## [1] "Date" "Location" "MinTemp" "MaxTemp"
## [5] "Rainfall" "Evaporation" "Sunshine" "WindGustDir"
## [9] "WindGustSpeed" "WindDir9am" "WindDir3pm" "WindSpeed9am"
## [13] "WindSpeed3pm" "Humidity9am" "Humidity3pm" "Pressure9am"
## [17] "Pressure3pm" "Cloud9am" "Cloud3pm" "Temp9am"
## [21] "Temp3pm" "RainToday" "RISK_MM" "RainTomorrow"
We can rename the variables based on their name using dplyr::rename():
# Rename the variables based on their names.
ds %>%
rename(date=Date, location=Location, min_temp=MinTemp, max_temp=MaxTemp)
## # A tibble: 226,868 × 24
## date location min_temp max_temp Rainfall Evaporation Sunshine
## <date> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2008-12-01 Albury 13.4 22.9 0.6 NA NA
## 2 2008-12-02 Albury 7.4 25.1 0 NA NA
## 3 2008-12-03 Albury 12.9 25.7 0 NA NA
## 4 2008-12-04 Albury 9.2 28 0 NA NA
## 5 2008-12-05 Albury 17.5 32.3 1 NA NA
## 6 2008-12-06 Albury 14.6 29.7 0.2 NA NA
## 7 2008-12-07 Albury 14.3 25 0 NA NA
## 8 2008-12-08 Albury 7.7 26.7 0 NA NA
## 9 2008-12-09 Albury 9.7 31.9 0 NA NA
## 10 2008-12-10 Albury 13.1 30.1 1.4 NA NA
## # ℹ 226,858 more rows
## # ℹ 17 more variables: WindGustDir <ord>, WindGustSpeed <dbl>,
## # WindDir9am <ord>, WindDir3pm <ord>, WindSpeed9am <dbl>, WindSpeed3pm <dbl>,
## # Humidity9am <int>, Humidity3pm <int>, Pressure9am <dbl>, Pressure3pm <dbl>,
## # Cloud9am <int>, Cloud3pm <int>, Temp9am <dbl>, Temp3pm <dbl>,
## # RainToday <fct>, RISK_MM <dbl>, RainTomorrow <fct>
To rename all the variables by their position:
If the data is loaded using readr::read_csv() or
readxl::read_excel(), for example, then using col_names=
there
is recommended instead.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0