8.11 Random Seed

REVIEW In much of our modelling we will be randomly sampling datasets. In sampling datasets a random number sequence will be used. Such a sequence can be repeatable by initialising with a ``randomly’’ selected seed. We do this so that we can replicate the examples presented throughout this book. We will shortly identify a random training dataset as a subset of the whole dataset. To ensure the same random subset is selected each time we initiate the random number generator with a specific seed using base::set.seed(). For no particular reason we choose a seed.

seed <- 42
set.seed(seed)

It is worth noting that many model builders use heuristics to search for a good model. The general approach is to search for a good model rather than the best model. This is often necessary because the computational requirements to find the best model will generally be prohibitive and can be as much as years of computer time. Searching for the best model involves searching through an enormous search space of all of the possible models. Our algorithms will reduce the computational requirements to something feasible using heuristics. Such heuristics often involve some level of random decision making in deciding which paths to follow in any search. By setting the seed for the random number generator to a known initial value will ensure we can replicate the model building later.



Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0