2.7 A Glimpse of the Dataset

REVIEW Another example of a useful command that we will find ourselves using often is the glimpse() command from the dplyr (Wickham, François, et al. 2023) package. This command can be accessed in the R console as dplyr::glimpse() once the dplyr (Wickham, François, et al. 2023) package has been installed. This particular command accepts an argument x= which names the dataset we wish to glimpse. In the following R example we use the weatherAUS dataset from the rattle (G. Williams 2023) package.

# Review the dataset.

dplyr::glimpse(x=rattle::weatherAUS)
## Rows: 226,868
## Columns: 24
## $ Date          <date> 2008-12-01, 2008-12-02, 2008-12-03, 2008-12-04, 2008-12…
## $ Location      <chr> "Albury", "Albury", "Albury", "Albury", "Albury", "Albur…
## $ MinTemp       <dbl> 13.4, 7.4, 12.9, 9.2, 17.5, 14.6, 14.3, 7.7, 9.7, 13.1, …
## $ MaxTemp       <dbl> 22.9, 25.1, 25.7, 28.0, 32.3, 29.7, 25.0, 26.7, 31.9, 30…
## $ Rainfall      <dbl> 0.6, 0.0, 0.0, 0.0, 1.0, 0.2, 0.0, 0.0, 0.0, 1.4, 0.0, 2…
## $ Evaporation   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ Sunshine      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ WindGustDir   <ord> W, WNW, WSW, NE, W, WNW, W, W, NNW, W, N, NNE, W, SW, NA…
## $ WindGustSpeed <dbl> 44, 44, 46, 24, 41, 56, 50, 35, 80, 28, 30, 31, 61, 44, …
## $ WindDir9am    <ord> W, NNW, W, SE, ENE, W, SW, SSE, SE, S, SSE, NE, NNW, W, …
## $ WindDir3pm    <ord> WNW, WSW, WSW, E, NW, W, W, W, NW, SSE, ESE, ENE, NNW, S…
## $ WindSpeed9am  <dbl> 20, 4, 19, 11, 7, 19, 20, 6, 7, 15, 17, 15, 28, 24, 4, N…
## $ WindSpeed3pm  <dbl> 24, 22, 26, 9, 20, 24, 24, 17, 28, 11, 6, 13, 28, 20, 30…
## $ Humidity9am   <int> 71, 44, 38, 45, 82, 55, 49, 48, 42, 58, 48, 89, 76, 65, …
## $ Humidity3pm   <int> 22, 25, 30, 16, 33, 23, 19, 19, 9, 27, 22, 91, 93, 43, 3…
## $ Pressure9am   <dbl> 1007.7, 1010.6, 1007.6, 1017.6, 1010.8, 1009.2, 1009.6, …
## $ Pressure3pm   <dbl> 1007.1, 1007.8, 1008.7, 1012.8, 1006.0, 1005.4, 1008.2, …
## $ Cloud9am      <int> 8, NA, NA, NA, 7, NA, 1, NA, NA, NA, NA, 8, 8, NA, NA, 0…
## $ Cloud3pm      <int> NA, NA, 2, NA, 8, NA, NA, NA, NA, NA, NA, 8, 8, 7, NA, N…
## $ Temp9am       <dbl> 16.9, 17.2, 21.0, 18.1, 17.8, 20.6, 18.1, 16.3, 18.3, 20…
## $ Temp3pm       <dbl> 21.8, 24.3, 23.2, 26.5, 29.7, 28.9, 24.6, 25.5, 30.2, 28…
## $ RainToday     <fct> No, No, No, No, No, No, No, No, No, Yes, No, Yes, Yes, Y…
## $ RISK_MM       <dbl> 0.0, 0.0, 0.0, 1.0, 0.2, 0.0, 0.0, 0.0, 1.4, 0.0, 2.2, 1…
## $ RainTomorrow  <fct> No, No, No, No, No, No, No, No, Yes, No, Yes, Yes, Yes, …

We can see here the command that was run and the output from running that command. As a convention used in this book the output from running R commands is prefixed with ##. The # introduces a comment in an R script file and tells R to ignore everything that follows on that line. We use the `##’ convention throughout the book to clearly identify output produced by R. When we run these commands ourselves in R this prefix is not displayed.

Long lines of output are also truncated for our presentation here. The ... at the end of the lines and the .... at the end of the output indicate that the output has been truncated for the sake of keeping our example output to a minimum.

This is a data frame, which is the basic data structure used to store a dataset within R, enhanced by the tidyverse to add functionality that improves our interactions with the data frame.

References

Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Williams, Graham. 2023. Rattle: Graphical User Interface for Data Science in r. https://rattle.togaware.com/.


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0