7.8 Random Dataset

20200421 Using wakefield::r_data_frame() it is straightforward to create simple random data datasets. More sophisticated sampling can be used if required though extra work is required to get correlated distributions across variables when looking to most accurately match some real world data. Nonetheless, creating random data quickly is the forte of wakefield (Rinker 2020).

library(wakefield)    # Generate random datasets.

nobs <- 100 # Number of observatios.

r_data_frame(n=nobs,
             id=id_factor,
             givennames=name,
             surname=name,
             dob=dob(start=Sys.Date()-365*100, k=365*99),
             sex=sex,
             phone=r_sample(x=400000000:499999999),
             diabetes=answer,
             result=r_sample_factor(x=c("pending", "positive", "negative")),
             previous=answer(prob=c(0.8,0.2)),
             bps=r_sample(90:135),
             temp=normal(36.3, 1.2),
             gcs=r_sample(3:15, prob=c(rep(0.01, 12), 0.88)),
             crp=normal(2.5, 2, min=0, max=200)) ->
random_ds

glimpse(random_ds)
## Rows: 100
## Columns: 13
## $ id         <fct> 001, 002, 003, 004, 005, 006, 007, 008, 009, 010, 011, 012,…
## $ givennames <chr> "Tremont", "Nikyta", "Cleaston", "Oteria", "Kennedii", "Mar…
## $ surname    <chr> "Edyta", "Amryn", "Adyaan", "Seairah", "Charliyah", "Kathey…
## $ dob        <date> 1949-02-21, 1994-01-08, 2014-04-20, 1997-10-04, 2000-02-06…
## $ sex        <fct> Female, Male, Male, Male, Male, Male, Male, Female, Female,…
## $ phone      <int> 451227361, 405775828, 465460696, 404000096, 445678580, 4626…
## $ diabetes   <fct> Yes, No, Yes, No, Yes, No, No, No, No, Yes, Yes, Yes, No, N…
## $ result     <fct> pending, pending, positive, positive, pending, pending, pen…
## $ previous   <fct> Yes, Yes, No, No, No, No, No, No, No, No, Yes, Yes, No, Yes…
## $ bps        <int> 128, 104, 109, 132, 125, 105, 90, 120, 99, 114, 121, 92, 13…
## $ temp       <dbl> 34.43041, 36.71445, 34.08584, 36.18020, 36.33505, 35.93572,…
## $ gcs        <int> 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,…
## $ crp        <dbl> 5.5074746, 3.9032439, 2.4291912, 0.8751326, 3.1985699, 2.51…

References

———. 2020. Wakefield: Generate Random Data Sets. https://github.com/trinker/wakefield.


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0