10.32 Transforming Data in Rattle

20240811

The Transform tab in Rattle provides numerous options for transforming our datasets. Cleaning our data and creating new features from the data occupies much of our time as data miners. There is a myriad of approaches, and a programming language like R supports them all. Through the Rattle user interface we can perform some of the more common transformations. This includes normalising our data, filling in missing values, turning numeric variables into categoric variables, and vice versa, dealing with outliers, and removing variables or observations with missing values. More complex transformations are then available through R.

Transformations are not always appropriate and so we indicate where they might be applicable as well as providing warnings about the different approaches, particularly in the context of imputation, which can significantly alter the distribution of our datasets.

In tuning our dataset to suit our needs, we do often transform it in many different ways. Of course, once we have transformed our dataset, we will want to save the new version. After working on our dataset through the Transform tab we can save the data through the Console tab:

ds %>%
  dplyr::select(date, location, min_temp, max_temp, temp_9am, temp_3pm) %>%
  readr::write_csv('my_new_dataset.csv')

See Section 10.15 for details of the Impute feature of Rattle (and Sections 10.22 and 10.27 for general discussion), Sections 10.28 for the Rescale feature, Sections ?? for the Recode feature, and Sections ?? for the Cleanup feature,



Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0