10.16 Replacing Missing Values

20201026 See Section 10.11 to impute (guess) values to replace missing values.

To replace missing values (NA) in a data set with a specific default value, like 0 for numeric data, we can use tidyr::replace_na() within a pipeline. In the following example only the numeric columns of the dataset are considered dplyr::across() the dataset, by checking tidyselect::where() the data base::is.numeric().

ds %>%
  mutate(across(where(is.numeric), ~replace_na(.x, 0)))
## # A tibble: 176,747 x 24
##    date       location min_temp max_temp rainfall evaporation sunshine
##    <date>     <chr>       <dbl>    <dbl>    <dbl>       <dbl>    <dbl>
##  1 2008-12-01 Albury       13.4     22.9      0.6         4.8      8.5
##  2 2008-12-02 Albury        7.4     25.1      0           4.8      8.5
##  3 2008-12-03 Albury       12.9     25.7      0           4.8      8.5
##  4 2008-12-04 Albury        9.2     28        0           4.8      8.5
##  5 2008-12-05 Albury       17.5     32.3      1           4.8      8.5
##  6 2008-12-06 Albury       14.6     29.7      0.2         4.8      8.5
##  7 2008-12-07 Albury       14.3     25        0           4.8      8.5
##  8 2008-12-08 Albury        7.7     26.7      0           4.8      8.5
##  9 2008-12-09 Albury        9.7     31.9      0           4.8      8.5
## 10 2008-12-10 Albury       13.1     30.1      1.4         4.8      8.5
## # … with 176,737 more rows, and 17 more variables: wind_gust_dir <ord>,
## #   wind_gust_speed <dbl>, wind_dir_9am <ord>, wind_dir_3pm <ord>,
## #   wind_speed_9am <dbl>, wind_speed_3pm <dbl>, humidity_9am <dbl>,
## #   humidity_3pm <dbl>, pressure_9am <dbl>, pressure_3pm <dbl>,
## #   cloud_9am <dbl>, cloud_3pm <dbl>, temp_9am <dbl>, temp_3pm <dbl>,
## #   rain_today <fct>, risk_mm <dbl>, rain_tomorrow <fct>


Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.