10.26 Rename Variables

20211124 We can rename variables within a pipeline. To normalise the variables automatically see Section 10.24. If we have a set of new names that we want to use instead of the original names or instead of the automatically normalised names, then use dplyr::rename() in a pipeline to rename a few by naming the old and new or dplyr::rename_with() to rename all by position. In the latter case it is often simpler to name the variables on readr::read_csv(), for example, using col_names=.

The original variable names are obtained using base::names():

names(ds)

##  [1] "Date"          "Location"      "MinTemp"       "MaxTemp"      
##  [5] "Rainfall"      "Evaporation"   "Sunshine"      "WindGustDir"  
##  [9] "WindGustSpeed" "WindDir9am"    "WindDir3pm"    "WindSpeed9am" 
## [13] "WindSpeed3pm"  "Humidity9am"   "Humidity3pm"   "Pressure9am"  
## [17] "Pressure3pm"   "Cloud9am"      "Cloud3pm"      "Temp9am"      
## [21] "Temp3pm"       "RainToday"     "RISK_MM"       "RainTomorrow"

We can rename the variables based on their name using dplyr::rename():

# Rename the variables based on their names.

ds %>%
  rename(date=Date, location=Location, min_temp=MinTemp, max_temp=MaxTemp)

## # A tibble: 226,868 × 24
##    date       location min_temp max_temp Rainfall Evaporation Sunshine
##    <date>     <chr>       <dbl>    <dbl>    <dbl>       <dbl>    <dbl>
##  1 2008-12-01 Albury       13.4     22.9      0.6          NA       NA
##  2 2008-12-02 Albury        7.4     25.1      0            NA       NA
##  3 2008-12-03 Albury       12.9     25.7      0            NA       NA
##  4 2008-12-04 Albury        9.2     28        0            NA       NA
##  5 2008-12-05 Albury       17.5     32.3      1            NA       NA
##  6 2008-12-06 Albury       14.6     29.7      0.2          NA       NA
##  7 2008-12-07 Albury       14.3     25        0            NA       NA
##  8 2008-12-08 Albury        7.7     26.7      0            NA       NA
##  9 2008-12-09 Albury        9.7     31.9      0            NA       NA
## 10 2008-12-10 Albury       13.1     30.1      1.4          NA       NA
## # ℹ 226,858 more rows
## # ℹ 17 more variables: WindGustDir <ord>, WindGustSpeed <dbl>,
## #   WindDir9am <ord>, WindDir3pm <ord>, WindSpeed9am <dbl>, WindSpeed3pm <dbl>,
## #   Humidity9am <int>, Humidity3pm <int>, Pressure9am <dbl>, Pressure3pm <dbl>,
## #   Cloud9am <int>, Cloud3pm <int>, Temp9am <dbl>, Temp3pm <dbl>,
## #   RainToday <fct>, RISK_MM <dbl>, RainTomorrow <fct>

To rename all the variables by their position:

ds %>%
  rename_with(function(x) c('date', 'location', 'min_temp', 'max_temp', ...)

If the data is loaded using readr::read_csv() or readxl::read_excel(), for example, then using col_names= there is recommended instead.

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0