1.1 Data Wrangling

Data wrangling is defined by Wikipedia (http://en.wikipedia.org/wiki/Data_wrangling) as the process of manually converting or mapping data from one “raw” form into another format that allows for more convenient consumption of the data with the help of semi-automated tools.

Those who do this are data wranglers.

In this introductory chapter we present the concepts of data science, Analytics, and the role and toolkit of the data scientist. We identify the progression from data Technician, through data analyst and data miner, to data scientist. The remaining chapters of this book then provide a guide to the key tools for doing data science in a hands-on manner through the most powerful software system for doing data science called R (R Core Team 2023). R is today supported by all of the major vendors and is the backbone of the emerging data science platforms in the cloud.


R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0