15.6 Hyper-Parameter Tuning

20210626 Machine learning algorithms, whether they are parametric and non-parametric, have hyper-parameters that require tuning to obtain the best possible model. For example, a parametric algorithm like the Support Vector Machine (SVM) has parameters that include the kernel type (linear, polynomial, sigmoid, or any other function), regularisation, a polynomial degree, and a polynomial/sigmoid gamma. The non-parametric algorithm k-Nearest Neighbours (kNN) requires the number of neighbours to be specified. This parameters, which are different to the parameters that are learned through the machine learning algorithm, are referred to as hyper-parameters.

Different values for the hyper-parameters can result in quite different models, with quite different performance metrics. For a long time the skill of a Data Scientist was in choosing the right parameters. Automated approaches to hyper-parameter tuning have been developed to remove this burden from the Data Scientist.

Hyper-parameter tuning algorithms will explore many combinations of hyper-parameters to find an good, or even optimal, combination. A measure is required to compare the quality (or often the performance) of a model. The hyper-parameter tuning algorithm will attempt to minimise this loss-function by search through a generally very large search-space.

A grid search algorithm will perform an exhaustive search over the full search space (i.e., all possible hyper-parameter combinations) to find the best model. This will surely return the best model but is very computationally expensive for even moderate numbers of hyper-parameters and possible values.

A random search algorithm randomly selects hyper-parameter combinations and only continues whilst the models performance is below a threshold. A common approach is to specify a computational budget and to then choose the best model once the budget has been exhausted.

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0