10.10 Filter Rows Having Missing Values

20201202 To select the rows from a dataset which have missing values in any of the columns across the dataset we stats::filter() dplyr::across() tidyr::everything() that base::is.na() and reduce it within the stats::filter() using the or operator. In the example we randomly sample a few rows and columns to show the result.

ds %>%
  filter(across(everything(), is.na) %>% reduce(`|`)) %>%
  sample_frac() %>%
  select(date, location, sample(3:length(vars), 4))
## # A tibble: 114,944 x 6
##    date       location      wind_gust_speed temp_9am sunshine temp_3pm
##    <date>     <chr>                   <dbl>    <dbl>    <dbl>    <dbl>
##  1 2015-01-20 NorahHead                  48     20.2     NA       23.1
##  2 2009-10-02 Canberra                   NA     12.6     NA       16.9
##  3 2014-07-13 Williamtown                59     10.3     NA       14.4
##  4 2014-12-05 BadgerysCreek              59     21.2     NA       30.5
##  5 2018-01-30 BadgerysCreek              41     24.1     NA       35.3
##  6 2009-02-21 Albury                     28     19.3     NA       29.8
##  7 2012-01-28 Woomera                    65     26.8     NA       31.8
##  8 2015-01-18 Williamtown                37     23.8     NA       26.8
##  9 2010-03-28 MountGinini                41     13.4     NA       18.1
## 10 2009-08-10 Albany                     NA     11.5      2.3     17.1
## # … with 114,934 more rows


Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.