10.11 Filter Rows Having Missing Values
20201202 To select the rows from a dataset which have missing values in any of the columns across the dataset we dplyr::filter() dplyr::across() tidyselect::everything() that base::is.na() and reduce it within the dplyr::filter() using the or operator. In the example we randomly sample a few rows and columns to show the result.
%>%
ds filter(across(everything(), is.na) %>% reduce(`|`)) %>%
sample_frac() %>%
select(date, location, sample(3:length(vars), 4))
## # A tibble: 148,709 × 6
## date location wind_dir_9am temp_3pm humidity_9am rain_today
## <date> <chr> <ord> <dbl> <int> <fct>
## 1 2017-12-25 WaggaWagga ESE 28 55 No
## 2 2010-01-09 Richmond <NA> 36.2 72 No
## 3 2014-07-12 Williamtown WNW 17 56 <NA>
## 4 2014-05-17 Richmond <NA> 23.4 90 No
## 5 2014-03-09 Newcastle NE 26.5 70 No
## 6 2011-10-19 Walpole SSE 16.4 56 Yes
## 7 2018-11-22 Portland SSW 10.7 77 Yes
## 8 2018-06-23 Watsonia <NA> 10.8 88 No
## 9 2018-06-01 WaggaWagga ENE 14.5 74 No
## 10 2020-11-23 Cobar SSE 26.4 50 No
## # … with 148,699 more rows
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0
