20180726 Sometimes there may be further operations to perform on the dataset prior to modelling. A common task is to deal with missing values. Here we remove observations with a missing target. As with any missing data we should also analyse whether there is any pattern to the missing targets. This may be indicative of a systemic data issue rather than simply randomly missing values.
# Check the dimensions to start with. dim(ds)
##  226868 24
# Identify observations with a missing target. %>% ds pull(target) %>% is.na() -> missing.target # Check how many are found. sum(missing.target)
##  6774
# Remove observations with a missing target. %<>% filter(!missing.target) ds # Confirm the filter delivered the expected dataset. dim(ds)
##  220094 24
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0