10.11 Filter Rows Having Missing Values
20201202 To select the rows from a dataset which have missing values in any of the columns across the dataset we dplyr::filter() dplyr::across() tidyselect::everything() that base::is.na() and reduce it within the dplyr::filter() using the or operator. In the example we randomly sample a few rows and columns to show the result.
%>%
ds filter(across(everything(), is.na) %>% reduce(`|`)) %>%
sample_frac() %>%
select(date, location, sample(3:length(vars), 4))
## # A tibble: 114,944 × 6
## date location cloud_3pm pressure_9am min_temp rain_tomorrow
## <date> <chr> <int> <dbl> <dbl> <fct>
## 1 2009-02-27 Wollongong NA 1018. 17.5 No
## 2 2014-09-14 Bendigo 4 1022. 3.9 No
## 3 2020-02-05 Walpole NA 1013. 17 No
## 4 2011-02-07 Witchcliffe NA NA 13.5 No
## 5 2012-04-27 Watsonia 5 1024. 8.7 No
## 6 2014-04-21 Tuggeranong NA 1022. 2.8 No
## 7 2014-12-24 GoldCoast NA 1014. 21.6 No
## 8 2009-04-18 MountGinini NA NA 3 No
## 9 2014-12-10 Richmond 8 1015. 20 Yes
## 10 2017-03-25 Nhil NA 1012. 17.1 No
## # … with 114,934 more rows
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0
