10.2 Wrangling Data Review
It is always useful to remind ourselves of the dataset with a random sample:
%>% sample_frac() %>% select(date, location, sample(3:length(vars), 5)) ds
## # A tibble: 217,049 × 7
## date location rain_tomorrow wind_dir_3pm min_temp max_temp wind_…¹
## <date> <chr> <fct> <ord> <dbl> <dbl> <dbl>
## 1 2011-09-16 Woomera No NW 14.2 32.8 28
## 2 2014-08-25 Perth No NW 9.8 24.5 15
## 3 2012-11-26 PerthAirport No WSW 15.2 30.1 17
## 4 2018-01-17 Launceston No NW 14.5 24.9 4
## 5 2014-12-23 Nhil No WSW 16.1 26 19
## 6 2010-11-18 NorfolkIsland No SSW 18.3 22.5 17
## 7 2014-03-04 Witchcliffe No SSE 13.5 24.5 26
## 8 2017-03-31 Dartmoor No WNW 6.4 16.5 2
## 9 2014-04-19 Melbourne <NA> SW 11.7 18.1 17
## 10 2021-12-04 Mildura No S 12.5 26.8 22
## # … with 217,039 more rows, and abbreviated variable name ¹wind_speed_9am
glimpse(ds)
## Rows: 217,049
## Columns: 24
## $ date <date> 2008-12-01, 2008-12-02, 2008-12-03, 2008-12-04, 2008-…
## $ location <chr> "Albury", "Albury", "Albury", "Albury", "Albury", "Alb…
## $ min_temp <dbl> 13.4, 7.4, 12.9, 9.2, 17.5, 14.6, 14.3, 7.7, 9.7, 13.1…
## $ max_temp <dbl> 22.9, 25.1, 25.7, 28.0, 32.3, 29.7, 25.0, 26.7, 31.9, …
## $ rainfall <dbl> 0.6, 0.0, 0.0, 0.0, 1.0, 0.2, 0.0, 0.0, 0.0, 1.4, 0.0,…
## $ evaporation <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ sunshine <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ wind_gust_dir <ord> W, WNW, WSW, NE, W, WNW, W, W, NNW, W, N, NNE, W, SW, …
## $ wind_gust_speed <dbl> 44, 44, 46, 24, 41, 56, 50, 35, 80, 28, 30, 31, 61, 44…
## $ wind_dir_9am <ord> W, NNW, W, SE, ENE, W, SW, SSE, SE, S, SSE, NE, NNW, W…
## $ wind_dir_3pm <ord> WNW, WSW, WSW, E, NW, W, W, W, NW, SSE, ESE, ENE, NNW,…
## $ wind_speed_9am <dbl> 20, 4, 19, 11, 7, 19, 20, 6, 7, 15, 17, 15, 28, 24, 4,…
## $ wind_speed_3pm <dbl> 24, 22, 26, 9, 20, 24, 24, 17, 28, 11, 6, 13, 28, 20, …
## $ humidity_9am <int> 71, 44, 38, 45, 82, 55, 49, 48, 42, 58, 48, 89, 76, 65…
## $ humidity_3pm <int> 22, 25, 30, 16, 33, 23, 19, 19, 9, 27, 22, 91, 93, 43,…
## $ pressure_9am <dbl> 1007.7, 1010.6, 1007.6, 1017.6, 1010.8, 1009.2, 1009.6…
## $ pressure_3pm <dbl> 1007.1, 1007.8, 1008.7, 1012.8, 1006.0, 1005.4, 1008.2…
## $ cloud_9am <int> 8, NA, NA, NA, 7, NA, 1, NA, NA, NA, NA, 8, 8, NA, NA,…
....
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0
