10.10 Filter Rows Having Missing Values

20201202 To select the rows from a dataset which have missing values in any of the columns across the dataset we dplyr::filter() dplyr::across() tidyselect::everything() that base::is.na() and reduce it within the dplyr::filter() using the or operator. In the example we randomly sample a few rows and columns to show the result.

ds %>%
  filter(across(everything(), is.na) %>% reduce(`|`)) %>%
  sample_frac() %>%
  select(date, location, sample(3:length(vars), 4))
## # A tibble: 133,253 × 6
##    date       location     wind_dir_9am wind_gust_dir max_temp evaporation
##    <date>     <chr>        <ord>        <ord>            <dbl>       <dbl>
##  1 2010-09-15 Sydney       W            <NA>              20.4         1.8
##  2 2014-01-13 Uluru        E            E                 39.5        NA  
##  3 2020-11-23 AliceSprings ESE          SE                36.4        37.6
##  4 2011-09-08 MountGinini  ENE          NNW                6.6        NA  
##  5 2012-03-28 Wollongong   S            NNW               21.4        NA  
##  6 2009-06-24 NorahHead    NW           NNW               18.4        NA  
##  7 2012-10-28 Wollongong   S            SSW               19.2        NA  
##  8 2019-11-14 MountGambier WNW          W                 15.4        NA  
##  9 2014-01-18 Uluru        E            E                 25.3        NA  
## 10 2020-12-22 Launceston   SE           SSE               18.5        NA  
## # … with 133,243 more rows


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0