7 R Read, Write, and Create

20200104 Data is available in an enormous variety of formats and stored in a diverse number of locations, including our own data stores, local disks, cloud stores, database systems, and from the Internet sites like https://data.gov, https://data.gov.uk/, and https://data.gov.au/. A major task we face as Data Scientists is ingesting that data into R in order for us to perform our analyses. R provides extensive capabilities for reading (aka importing and ingesting) data and for writing (aka exporting) data.

Here we explore the options, including a simple and widely used format known as comma separated values (CSV) files. A variation of this is the tab separated values, or indeed other special characters used to separate the columns of the data.

But data comes in an amazing variety of formats, including many proprietary formats that need special effort to decipher. R supports almost every known format through many different packages. This chapter introduces the numerous options available.

Also included is a guide to creating your own random dataset for testing ideas and building systems where the actual data may not be so readily available due to privacy, for example.



Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0