2 Introducing R and Rattle

Data scientists write programs to ingest, fuse, clean, wrangle, visualise, analyse, and model data. Programming over data is a core task for the data scientist. We will primarily use R (R Core Team 2024) and Rattle for our data science activities. While familiarity with R is not necessarily required some basic familiarity may be gained from the many resources available on the Intranet, particularly from https://cran.r-project.org/manuals.html.

Rattle on the other hand, requires little experience and aims to walk you through the process of analysing new data as a data scientist, using a simple graphical user interface. Rattle will generate for you all the R code you need to ingest, clean, wrangle, visualise, analyse and model your data. Even more usefull is Rattle’s Script tab which captures all of the R code into a script file that you can learn R from and later run directly using R.

Back in the world of R we focus particularly on tidyverse and the data pipeline as our programming paradigm. The development of tidyverse has been instrumental in bringing R into the modern data science era and the resources provided by the tidyverse community are extensive.

For developing R code directly the recommended environment is either VSCode or RStudio. As you develop your data analyses, be sure to have the RStudio cheatsheets for the tidyverse in front of you. You will find them invaluable. Visit https://rstudio.com/resources/cheatsheets/.

Programmers of data develop sentences or code. Code instructs a computer to perform specific tasks. A collection of sentences written in a language is what we might call a program. Through programming by example and learning by immersion we will share programs to deliver insights and outcomes from our data.

R is a large and complex ecosystem for the practice of data science. There is much freely available information on the Internet from which we can continually learn and borrow useful code segments that illustrate almost any task we might think of. We introduce here the basics for getting started with R, libraries and packages which extend the language, and the concepts of functions, commands, and operators.

References

R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0