10.14 Filter Rows Having Missing Values

20201202 To select the rows from a dataset which have missing values in any of the columns across the dataset we dplyr::filter() dplyr::across() tidyselect::everything() that base::is.na() and reduce it within the dplyr::filter() using the or operator. In the example we randomly sample a few rows and columns to show the result.

ds %>%
  filter(across(everything(), is.na) %>% reduce(`|`)) %>%
  sample_frac() %>%
  select(date, location, sample(3:length(vars), 4))
## # A tibble: 156,731 × 6
##    date       location     min_temp humidity_9am pressure_3pm wind_dir_9am
##    <date>     <chr>           <dbl>        <int>        <dbl> <ord>       
##  1 2013-03-12 Hobart           17.1           68        1010. NNW         
##  2 2023-02-16 Witchcliffe      12.5           67        1013. NNW         
##  3 2020-03-06 MountGambier     13.2           87        1019. SE          
##  4 2019-10-25 PearceRAAF        6.5           49        1019  SSE         
##  5 2018-10-20 Albury           14             69        1011. SE          
##  6 2020-05-13 Albury            4.8           92        1023. <NA>        
##  7 2017-12-13 MountGambier     13.4           26        1002. N           
##  8 2014-07-13 Penrith           5.8           46          NA  SW          
##  9 2015-06-07 Albury            2.3          100        1026. ESE         
## 10 2013-05-01 Nuriootpa         5.6          100        1024. <NA>        
## # ℹ 156,721 more rows


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0