10.10 Dollar to Numeric Conversion

20200813 Dollar amounts ingested from other applications might be formatted as a string with a \$ %$ prefix and commas. Use readr::parse_number() to convert to a numeric rather than a string value. This then allows us to perform calculations on the amounts.

In the example below we use readr::parse_number() within a pipeline. It begins with ingesting the data from a csv file using readr::read_csv(). We then normalise the variable names using janitor::clean_names(). A tee pipe is then used to send the data along two pipes. The first pipe (between the curly brackets) will print a sample from the column to be converted, for information.

It is the other pipe that continues along the pipeline to perform the actual conversion using dplyr::mutate(). Here readr::parse_number() is applied to the appropriate column to effect the transformation of the dollar amount from string to numeric.

Another tee pipe is then used to print the result, for information and confirmation, whilst also assigning the dataset into a variable.

IFILE <- "data/dollars.csv"

IFILE %>%
  read_csv() %>%
  clean_names(numerals="right") %T>%
  {pull(., income) %>% head(20) %>% print(); cat("\n")} %>%
  mutate(income=parse_number(income)) %T>%
  {pull(., income) %>% head(20) %>% print(); cat("\n")} ->
dollars
## Rows: 2000 Columns: 3
## ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): income
## dbl (2): id, age
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
##  [1] "$81,838.00"  "$72,099.00"  "$154,676.74" "$27,743.82"  "$7,568.23"  
##  [6] "$33,144.40"  "$43,391.17"  "$59,906.65"  "$126,888.91" "$52,466.49" 
## [11] "$291,416.11" "$24,155.31"  "$143,254.86" "$120,554.81" "$34,919.16" 
## [16] "$67,176.79"  "$9,608.48"   "$12,475.84"  "$32,963.39"  "$31,534.97" 
## 
##  [1]  81838.00  72099.00 154676.74  27743.82   7568.23  33144.40  43391.17
##  [8]  59906.65 126888.91  52466.49 291416.11  24155.31 143254.86 120554.81
## [15]  34919.16  67176.79   9608.48  12475.84  32963.39  31534.97


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0