10.29 Dataset Head and Tail

20180721 Datasets can be very large, with many observations (millions) and many variables (thousands). We can’t be expected to browse through all of the observations and variables. Instead we might review the contents of the dataset using utils::head() and utils::tail() to consider the top six (by default) and the bottom six observations.

# Review the first few observations.

head(ds) %>% print.data.frame()
##         date location min_temp max_temp rainfall evaporation sunshine
## 1 2008-12-01   Albury     13.4     22.9      0.6          NA       NA
## 2 2008-12-02   Albury      7.4     25.1      0.0          NA       NA
## 3 2008-12-03   Albury     12.9     25.7      0.0          NA       NA
## 4 2008-12-04   Albury      9.2     28.0      0.0          NA       NA
## 5 2008-12-05   Albury     17.5     32.3      1.0          NA       NA
## 6 2008-12-06   Albury     14.6     29.7      0.2          NA       NA
##   wind_gust_dir wind_gust_speed wind_dir_9am wind_dir_3pm wind_speed_9am
## 1             W              44            W          WNW             20
## 2           WNW              44          NNW          WSW              4
## 3           WSW              46            W          WSW             19
## 4            NE              24           SE            E             11
## 5             W              41          ENE           NW              7
## 6           WNW              56            W            W             19
##   wind_speed_3pm humidity_9am humidity_3pm pressure_9am pressure_3pm cloud_9am
## 1             24           71           22       1007.7       1007.1         8
## 2             22           44           25       1010.6       1007.8        NA
## 3             26           38           30       1007.6       1008.7        NA
## 4              9           45           16       1017.6       1012.8        NA
## 5             20           82           33       1010.8       1006.0         7
## 6             24           55           23       1009.2       1005.4        NA
##   cloud_3pm temp_9am temp_3pm rain_today risk_mm rain_tomorrow
## 1        NA     16.9     21.8         No     0.0            No
## 2        NA     17.2     24.3         No     0.0            No
## 3         2     21.0     23.2         No     0.0            No
## 4        NA     18.1     26.5         No     1.0            No
## 5         8     17.8     29.7         No     0.2            No
## 6        NA     20.6     28.9         No     0.0            No
# Review the last few observations.

tail(ds) %>% print.data.frame()
##         date location min_temp max_temp rainfall evaporation sunshine
## 1 2020-04-24    Uluru     23.0     36.7        0          NA       NA
## 2 2020-04-25    Uluru     18.4     37.4        0          NA       NA
## 3 2020-04-26    Uluru     21.4     32.7        0          NA       NA
## 4 2020-04-27    Uluru     19.4     32.2        0          NA       NA
## 5 2020-04-28    Uluru     16.6     32.6        0          NA       NA
## 6 2020-04-29    Uluru     16.7     25.7        0          NA       NA
##   wind_gust_dir wind_gust_speed wind_dir_9am wind_dir_3pm wind_speed_9am
## 1           ESE              31            E            E             22
## 2             W              35            E          NNW              9
## 3             S              54          ESE          ESE             20
## 4            SE              41            E           NE             28
## 5           SSW              44            S            W             19
## 6           SSW              54            S           SW             31
##   wind_speed_3pm humidity_9am humidity_3pm pressure_9am pressure_3pm cloud_9am
## 1              9           30           17       1017.3       1013.4        NA
## 2             15           27           14       1016.8       1012.3        NA
## 3             17           41           24       1017.7       1013.9         4
## 4             13           59           27       1018.7       1013.2         4
## 5             15           37           25       1015.3       1012.0        NA
## 6             26           69           28       1015.9       1014.4        NA
##   cloud_3pm temp_9am temp_3pm rain_today risk_mm rain_tomorrow
## 1         2     27.4     35.6         No       0            No
## 2        NA     26.4     36.4         No       0            No
## 3         4     24.5     32.0         No       0            No
## 4        NA     21.4     30.5         No       0            No
## 5        NA     24.0     31.3         No       0            No
## 6        NA     18.8     25.1         No       0            No

All the time we are building a picture of the data we are looking at. It is beginning to confirm that location has multiple values whilst date does appear to be a sequence for each location.



Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.