10.11 Filter Rows Having Missing Values

20201202 To select the rows from a dataset which have missing values in any of the columns across the dataset we dplyr::filter() dplyr::across() tidyselect::everything() that base::is.na() and reduce it within the dplyr::filter() using the or operator. In the example we randomly sample a few rows and columns to show the result.

ds %>%
  filter(across(everything(), is.na) %>% reduce(`|`)) %>%
  sample_frac() %>%
  select(date, location, sample(3:length(vars), 4))
## # A tibble: 148,709 × 6
##    date       location    wind_dir_9am temp_3pm humidity_9am rain_today
##    <date>     <chr>       <ord>           <dbl>        <int> <fct>     
##  1 2017-12-25 WaggaWagga  ESE              28             55 No        
##  2 2010-01-09 Richmond    <NA>             36.2           72 No        
##  3 2014-07-12 Williamtown WNW              17             56 <NA>      
##  4 2014-05-17 Richmond    <NA>             23.4           90 No        
##  5 2014-03-09 Newcastle   NE               26.5           70 No        
##  6 2011-10-19 Walpole     SSE              16.4           56 Yes       
##  7 2018-11-22 Portland    SSW              10.7           77 Yes       
##  8 2018-06-23 Watsonia    <NA>             10.8           88 No        
##  9 2018-06-01 WaggaWagga  ENE              14.5           74 No        
## 10 2020-11-23 Cobar       SSE              26.4           50 No        
## # … with 148,699 more rows


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0