18.3 Algorithms Data Review

20210103 We can review a random sample of the dataset.

ds  %>% sample_frac()
## # A tibble: 176,747 × 24
##    date       location      min_temp max_temp rainfall evaporation sunshine
##    <date>     <chr>            <dbl>    <dbl>    <dbl>       <dbl>    <dbl>
##  1 2019-08-29 Dartmoor           2.1     12.1      7.8         4.8      8.5
##  2 2013-07-28 Townsville        16.7     26.9      0          10       10.1
##  3 2010-08-12 Bendigo            7.6     14.3      9           4.8      8.5
##  4 2018-07-08 Townsville        19.9     27.2      0           3.8      8.5
##  5 2010-01-21 Adelaide          22       40.2      0          10.2     10.3
##  6 2009-12-26 SalmonGums        18.4     38.4      0           4.8      8.5
##  7 2012-08-19 Launceston         0       13        0           4.8      8.5
##  8 2012-10-02 GoldCoast         15.7     24.1      0           4.8      8.5
##  9 2015-04-10 SydneyAirport     14.7     20.3      0.4         5.2      6  
## 10 2011-10-19 Albany            11.8     16.7      4           3.2     12.4
## # … with 176,737 more rows, and 17 more variables: wind_gust_dir <ord>,
## #   wind_gust_speed <dbl>, wind_dir_9am <ord>, wind_dir_3pm <ord>,
## #   wind_speed_9am <dbl>, wind_speed_3pm <dbl>, humidity_9am <dbl>,
## #   humidity_3pm <dbl>, pressure_9am <dbl>, pressure_3pm <dbl>,
## #   cloud_9am <dbl>, cloud_3pm <dbl>, temp_9am <dbl>, temp_3pm <dbl>,
## #   rain_today <fct>, risk_mm <dbl>, rain_tomorrow <fct>

And glimpse all of the variables.

glimpse(ds)
## Rows: 176,747
## Columns: 24
## $ date            <date> 2008-12-01, 2008-12-02, 2008-12-03, 2008-12-04, 2008-…
## $ location        <chr> "Albury", "Albury", "Albury", "Albury", "Albury", "Alb…
## $ min_temp        <dbl> 13.4, 7.4, 12.9, 9.2, 17.5, 14.6, 14.3, 7.7, 9.7, 13.1…
## $ max_temp        <dbl> 22.9, 25.1, 25.7, 28.0, 32.3, 29.7, 25.0, 26.7, 31.9, …
## $ rainfall        <dbl> 0.6, 0.0, 0.0, 0.0, 1.0, 0.2, 0.0, 0.0, 0.0, 1.4, 0.0,…
## $ evaporation     <dbl> 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8,…
## $ sunshine        <dbl> 8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5,…
## $ wind_gust_dir   <ord> W, WNW, WSW, NE, W, WNW, W, W, NNW, W, N, NNE, W, SW, …
## $ wind_gust_speed <dbl> 44, 44, 46, 24, 41, 56, 50, 35, 80, 28, 30, 31, 61, 44…
## $ wind_dir_9am    <ord> W, NNW, W, SE, ENE, W, SW, SSE, SE, S, SSE, NE, NNW, W…
## $ wind_dir_3pm    <ord> WNW, WSW, WSW, E, NW, W, W, W, NW, SSE, ESE, ENE, NNW,…
## $ wind_speed_9am  <dbl> 20, 4, 19, 11, 7, 19, 20, 6, 7, 15, 17, 15, 28, 24, 4,…
## $ wind_speed_3pm  <dbl> 24, 22, 26, 9, 20, 24, 24, 17, 28, 11, 6, 13, 28, 20, …
## $ humidity_9am    <dbl> 71, 44, 38, 45, 82, 55, 49, 48, 42, 58, 48, 89, 76, 65…
## $ humidity_3pm    <dbl> 22, 25, 30, 16, 33, 23, 19, 19, 9, 27, 22, 91, 93, 43,…
## $ pressure_9am    <dbl> 1007.7, 1010.6, 1007.6, 1017.6, 1010.8, 1009.2, 1009.6…
## $ pressure_3pm    <dbl> 1007.1, 1007.8, 1008.7, 1012.8, 1006.0, 1005.4, 1008.2…
## $ cloud_9am       <dbl> 8, 5, 5, 5, 7, 5, 1, 5, 5, 5, 5, 8, 8, 5, 5, 0, 8, 8, …
## $ cloud_3pm       <dbl> 5, 5, 2, 5, 8, 5, 5, 5, 5, 5, 5, 8, 8, 7, 5, 5, 1, 1, …
## $ temp_9am        <dbl> 16.9, 17.2, 21.0, 18.1, 17.8, 20.6, 18.1, 16.3, 18.3, …
## $ temp_3pm        <dbl> 21.8, 24.3, 23.2, 26.5, 29.7, 28.9, 24.6, 25.5, 30.2, …
## $ rain_today      <fct> No, No, No, No, No, No, No, No, No, Yes, No, Yes, Yes,…
## $ risk_mm         <dbl> 0.0, 0.0, 0.0, 1.0, 0.2, 0.0, 0.0, 0.0, 1.4, 0.0, 2.2,…
## $ rain_tomorrow   <fct> No, No, No, No, No, No, No, No, Yes, No, Yes, Yes, Yes…


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0