10.11 Missing Value Imputation

20201026 See Section 10.16 to replace missing values with specific values.

Missing value imputation is useful but must be done with care. It can be akin to inventing new data. We may be tempted to do so as a quick fix for avoiding warnings that would otherwise advise us of missing data when using , for example. We can utilise the imputation function randomForest::na.roughfix() to perform missing value imputation through the use of machine learning to fill in the gaps. This particular function operates on numeric and factor columns, thus we remove the first two columns from the dataset to be imputed (date and location),

# Count the number of missing values.

ds %>% is.na() %>% sum()
## [1] 464300
# No missing values in the first two columns (date and location)

ds[1:2] %>% is.na() %>% sum()
## [1] 0
# Impute missing values.

ds[3:24] %<>% na.roughfix()

# Confirm that no missing values remain.

ds %>% is.na() %>% sum()
## [1] 0

Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.