10.44 Dataset Head and Tail

20180721 Datasets can be very large, with many observations (millions) and many variables (thousands). We can’t be expected to browse through all of the observations and variables. Instead we might review the contents of the dataset using utils::head() and utils::tail() to consider the top six (by default) and the bottom six observations.

# Review the first few observations.

head(ds) %>% print.data.frame()

##         date location min_temp max_temp rainfall evaporation sunshine
## 1 2008-12-01   Albury     13.4     22.9      0.6          NA       NA
## 2 2008-12-02   Albury      7.4     25.1      0.0          NA       NA
## 3 2008-12-03   Albury     12.9     25.7      0.0          NA       NA
## 4 2008-12-04   Albury      9.2     28.0      0.0          NA       NA
## 5 2008-12-05   Albury     17.5     32.3      1.0          NA       NA
## 6 2008-12-06   Albury     14.6     29.7      0.2          NA       NA
##   wind_gust_dir wind_gust_speed wind_dir_9am wind_dir_3pm wind_speed_9am
## 1             W              44            W          WNW             20
## 2           WNW              44          NNW          WSW              4
## 3           WSW              46            W          WSW             19
....

# Review the last few observations.

tail(ds) %>% print.data.frame()

##         date location min_temp max_temp rainfall evaporation sunshine
## 1 2022-02-22    Uluru     18.9     35.4        0          NA       NA
## 2 2022-02-23    Uluru     20.6     37.0        0          NA       NA
## 3 2022-02-24    Uluru     18.7     36.9        0          NA       NA
## 4 2022-02-25    Uluru     20.6     37.4        0          NA       NA
## 5 2022-02-26    Uluru     23.0     37.6        0          NA       NA
## 6 2022-02-27    Uluru     20.9     39.1        0          NA       NA
##   wind_gust_dir wind_gust_speed wind_dir_9am wind_dir_3pm wind_speed_9am
## 1           ENE              33          ENE          SSE             26
## 2             E              31            E          SSW             24
## 3           SSE              52            E           SE             20
....

All the time we are building a picture of the data we are looking at. It is beginning to confirm that location has multiple values whilst date does appear to be a sequence for each location.

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0