10.21 Data Source

20180908 We begin with identifying a data source and choose the simplest of sources—a text-based csv (comma separated value) file as a typical data source format.

The rattle::weatherAUS dataset from rattle (G. Williams 2023) will again be used. A binary formatted R dataset is provided by the package but the CSV file for the same dataset is available at https://rattle.togaware.com/weatherAUS.csv.

Identify and record the location of the CSV file to analyse. R can ingesting data directly from the Internet and so we will illustrate that here. The location of the file (the so-called URL or universal resource location) will be saved as a string in a variable called dspath—the path to the dataset. The following assignment command does this for us. Simply type this into your R script file within RStudio. The command is then executed in RStudio by clicking the Run button whilst the cursor is situated on the line within the script file.

# Note the source location of a dataset to ingest into R.

dspath <- "http://rattle.togaware.com/weatherAUS.csv"

The assignment operator <- will store the value on the right hand side (which is a string enclosed within quotation marks) into the computer’s memory and we can later refer to it as the R variable dspath—we retrieve the string simply by reference to the variable dspath.

By typing the name of the variable (dspath) in the R Console at the > prompt R will respond with the value stored in the variable:

dspath
## [1] "http://rattle.togaware.com/weatherAUS.csv"

If not connected to the Internet we can read the data directly from a local copy of the csv file. The rattle (G. Williams 2023) package (once the package has been installed) provides a smaller sample . The location of the CSV file within rattle (G. Williams 2023) is determined using base::system.file(). Knowing that csv files are located within the csv sub-directory of the rattle (G. Williams 2023) package we generate the string that identifies the file system path to .

dspath <- system.file("csv", "weather.csv", package="rattle") %T>% print()
## [1] "/usr/lib/R/site-library/rattle/csv/weather.csv"

This is the path to the CSV file on my file system. Your path may well be different depending on where your system installed the rattle (G. Williams 2023) package.

Note that this is a considerably smaller subset of the full weatherAUS dataset and ingesting this rather than the full dataset will lead to different results to those presented here.

If you have separately downloaded then you can identify its location. Here we identify that the downloaded file is located in the current working directory.

dspath <- "./weatherAUS.csv"

References

Williams, Graham. 2023. Rattle: Graphical User Interface for Data Science in r. https://rattle.togaware.com/.


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0