10.18 Data Source
20180908 We begin with identifying a data source and choose the simplest of sources—a text-based csv (comma separated value) file as a typical data source format.
The rattle::weatherAUS dataset from rattle (G. Williams 2020) will again be used. A binary formatted R dataset is provided by the package but the CSV file for the same dataset is available at https://rattle.togaware.com/weatherAUS.csv.
Identify and record the location of the CSV file to analyse. R can
ingesting data directly from the Internet and so we will illustrate
that here. The location of the file (the so-called URL or universal
resource location) will be saved as a string in a variable called
dspath—the path to the dataset. The following assignment
command does this for us. Simply type this into your R script file
within RStudio. The command is then executed in RStudio by
Run button whilst the cursor is situated on the
line within the script file.
# Note the source location of a dataset to ingest into R. <- "http://rattle.togaware.com/weatherAUS.csv"dspath
The assignment operator
<- will store
the value on the right hand side (which is a string enclosed within
quotation marks) into the computer’s memory and we can later refer to
it as the R variable
dspath—we retrieve the string
simply by reference to the variable
By typing the name of the variable (
dspath) in the R
Console at the
> prompt R will respond with the value
stored in the variable:
##  "http://rattle.togaware.com/weatherAUS.csv"
If not connected to the Internet we can read the data directly from a
local copy of the csv file. The
rattle (G. Williams 2020) package (once the package has been installed)
provides a smaller sample . The location of the
CSV file within rattle (G. Williams 2020) is determined using
base::system.file(). Knowing that csv files
are located within the
csv sub-directory of the
rattle (G. Williams 2020) package we generate the string that
identifies the file system path to .
Note that this is a considerably smaller subset of the full weatherAUS dataset and ingesting this rather than the full dataset will lead to different results to those presented here.
If you have separately downloaded then you can identify its location. Here we identify that the downloaded file is located in the current working directory.
Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.