11 Data Visualisation
20240728
One of the most important tasks for any data scientist is to visualise data. Presenting data visually will often lead to new insights and discoveries, as well as providing glimpses of any issues with the data itself. Sometimes these will stand out in a visual presentation whilst being hidden in textual views. A visual presentation is also an effective means for communicating insight to others.
The simplest of plots is a Scatter Plot which plots points onto the graph. A Line Graph joins the points. A Bar Chart plots bars up to the points. A Box Plot presents the distribution of values of an observation. A Pie Chart is not usually recommended. Faceted plots consist of multiple graphs of the same data but across different factors. Plots can be coloured or annotated with text. Indeed, we are free to innovate with visualisations, though we need to be aware of principles for effectively communicating.
R is capable of producing excellent publication ready graphics in many formats, including vector image and text formats like Scalable Vector Graphics (svg), PostScript (eps), Portable Document Format (pdf), and Portable Network Graphics (png), and colorful images using the Joint Photographic Experts Group format (jpg).
R offers a comprehensive suite of tools to visualise data including graphics (R-graphics?) (traditional), grid (R-grid?) and lattice (Sarkar 2024) (high level graphics).
The newest plotting development, ggplot2 (Wickham et al. 2024), is based on the idea of a grammar for graphics, a grammar for writing sentences describing the graphics. The default themes for producing plots is based on the collective wisdom of many visual presentation researchers. Using this package we construct a plot beginning with the dataset along with the aesthetics (e.g., the x-axis and y-axis) and then add geometric elements, statistical operations, scales, facets, coordinates, and numerous other components to the plot. The official GGPlot2 documentation is extensive and accessible.
Claus O. Wilke’s book Fundamentals of Data Visualization provides an extensive and detailed guide to data visualisation. The book is also written in R Markdown and the source for the book is available through github.
References
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0