10.11 Filter Rows Having Missing Values

20201202 To select the rows from a dataset which have missing values in any of the columns across the dataset we dplyr::filter() dplyr::across() tidyselect::everything() that base::is.na() and reduce it within the dplyr::filter() using the or operator. In the example we randomly sample a few rows and columns to show the result.

ds %>%
  filter(across(everything(), is.na) %>% reduce(`|`)) %>%
  sample_frac() %>%
  select(date, location, sample(3:length(vars), 4))
## # A tibble: 114,944 × 6
##    date       location    cloud_3pm pressure_9am min_temp rain_tomorrow
##    <date>     <chr>           <int>        <dbl>    <dbl> <fct>        
##  1 2009-02-27 Wollongong         NA        1018.     17.5 No           
##  2 2014-09-14 Bendigo             4        1022.      3.9 No           
##  3 2020-02-05 Walpole            NA        1013.     17   No           
##  4 2011-02-07 Witchcliffe        NA          NA      13.5 No           
##  5 2012-04-27 Watsonia            5        1024.      8.7 No           
##  6 2014-04-21 Tuggeranong        NA        1022.      2.8 No           
##  7 2014-12-24 GoldCoast          NA        1014.     21.6 No           
##  8 2009-04-18 MountGinini        NA          NA       3   No           
##  9 2014-12-10 Richmond            8        1015.     20   Yes          
## 10 2017-03-25 Nhil               NA        1012.     17.1 No           
## # … with 114,934 more rows


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0