9.7 Selecting Rows

20200419 Rows from a data frame can be dplyr::filter()’ed using specific conditions. The rows in the resulting data frame will be those for which the condition is true.

ds %>%
  filter(max_temp >= mean(max_temp, na.rm=TRUE))
## # A tibble: 100,299 × 24
##    date       location min_temp max_temp rainf…¹ evapo…² sunsh…³ wind_…⁴ wind_…⁵
##    <date>     <chr>       <dbl>    <dbl>   <dbl>   <dbl>   <dbl> <ord>     <dbl>
##  1 2008-12-02 Albury        7.4     25.1     0        NA      NA WNW          44
##  2 2008-12-03 Albury       12.9     25.7     0        NA      NA WSW          46
##  3 2008-12-04 Albury        9.2     28       0        NA      NA NE           24
##  4 2008-12-05 Albury       17.5     32.3     1        NA      NA W            41
##  5 2008-12-06 Albury       14.6     29.7     0.2      NA      NA WNW          56
##  6 2008-12-07 Albury       14.3     25       0        NA      NA W            50
##  7 2008-12-08 Albury        7.7     26.7     0        NA      NA W            35
##  8 2008-12-09 Albury        9.7     31.9     0        NA      NA NNW          80
##  9 2008-12-10 Albury       13.1     30.1     1.4      NA      NA W            28
## 10 2008-12-11 Albury       13.4     30.4     0        NA      NA N            30
## # … with 100,289 more rows, 15 more variables: wind_dir_9am <ord>,
## #   wind_dir_3pm <ord>, wind_speed_9am <dbl>, wind_speed_3pm <dbl>,
## #   humidity_9am <int>, humidity_3pm <int>, pressure_9am <dbl>,
## #   pressure_3pm <dbl>, cloud_9am <int>, cloud_3pm <int>, temp_9am <dbl>,
## #   temp_3pm <dbl>, rain_today <fct>, risk_mm <dbl>, rain_tomorrow <fct>, and
## #   abbreviated variable names ¹​rainfall, ²​evaporation, ³​sunshine,
## #   ⁴​wind_gust_dir, ⁵​wind_gust_speed

To select rows that have missing values, for example, use dplyr::filter() with purrr::pmap_lgl() to map base::is.na() for base::any() column in the row. In the following example we count the number with missing values using base::nrow(), formatted nicely using scales::comma():

ds %>%
  select(-date) %>%
  filter(purrr::pmap_lgl(., ~any(is.na(c(...))))) %>%
  nrow() %>% scales::comma() %>% cat("rows have missing values.\n")
## 148,709 rows have missing values.


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0