7.3 CSV Data

20201022 One of the simplest and most common ways of sharing data today remains the Comma Separated Values (CSV) format. As a simple and even trivial format the use of csv files has become a standard file format used to exchange data between many different applications.

csv files can, for example, be exported and imported by numerous applications and spreadsheets and databases, including Rattle, LibreOffice Calc, Gnumeric, MS/Excel, SAS/Enterprise Miner, Teradata, Netezza, and many, many, other applications.

The downside of the csv format is that the file does not contain explicit metadata (i.e., data about the data), like the data types of the different columns. Typically we would like to know whether there is numeric or character data within the column. If numeric data then are they dates, or dollars, and if character are they categoric (factors). Without this metadata R commands for reading the data have to make a guess and will sometimes determine the wrong data type for a particular column. There are options available to reduce this possibility or to provide the metadata.

Reading csv files is straight forward using readr::read_csv().

library(readr)        # Modern and efficient data reader/writer.

ds <- read_csv("mydata.csv")

Column types can be specified using col_types="inffcD". The string can contain any of c (character), i (integer), n (numeric), d (double), l (logical), f (factor), D (date), T (data and time), t (time), ? (guess), _, (ignore).

Writing a dataset to a csv file is straightforward using readr::write_csv():

library(rattle)       # Dataset: weatherAUS.
library(dplyr)        # Wrangling: select().
library(readr)        # Modern and efficient data reader/writer.

ds <- weatherAUS

fname  <- "temperatureAUS.csv"

ds %>%
  select(Date, Location, MinTemp, MaxTemp, Temp9am, Temp3pm) %>%
  write_csv(fname)

To turn off the messaging of the identified columns, set the option for the number of columns to report to 0:

options(readr.num_columns=0)

To turn it back on, set it to NULL.

Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.