Hyperparameters are model parameters that are estimated without using actual, observed data. It’s basically a “good guess” at what a model’s parameters might be, without using your actual data.
More formally, a hyperparameter is a parameter from a prior distribution; it captures the prior belief, before data is observed (Riggelsen, 2008). For example, the hyperparameter η is a prior guess for the mean (μ) of some distribution X. Although the prior distribution can’t normally be described in full, it’s sometimes possible to make reasonable guesses about the distribution’s hyperparameters and thus construct a reasonable distribution.
The term “hyperparameter” is used to distinguish the prior “guess” parameters from other parameters used in statistics, such as coefficients in regression analysis.
Types
Most hyperparameters are one of two types (Fred et. al):
- Numerical (Hnum): can be a real number or an integer value; these are usually bounded by a reasonable minimum value and maximum value.
- Categorical (Hcat): one value is chosen from a set of possible values.
Use in Data Mining
In Data Mining, a hyperparameter refers to a prior parameter that needs to be tuned to optimize it (Witten et al., 2016). One example of such a parameter is the “k” in the k-nearest neighbor algorithm. These parameters must be tweaked on the training set only without looking at the actual data, because doing so introduces bias.
The process of finding the most optimal hyperparameters in machine learning is called hyperparameter optimization.
Common algorithms include:
- Bayesian Optimization: uses the mode to choose which hyperparameters to consider, based on the performance of past choices.
- Grid Search: brute forces all possible combinations.
- Random Search: randomly samples and evaluates sets from a specified probability distribution.
References
Fred, A. et al. (2016). Pattern Recognition: Applications and Methods: 4th International Conference, ICPRAM 2015, Lisbon, Portugal, January 10-12, 2015, Revised Selected Papers. Springer. Retrieved March 15, 2018 from: https://books.google.com/books?id=Bm9aCwAAQBAJ
NERSC. Optimization. Retrieved March 15, 2018 from: http://www.nersc.gov/users/data-analytics/data-analytics-2/deep-learning/hyperparameter-o/
Riggelsen, C. (2008). Approximation Methods for Efficient Learning of Bayesian Networks. IOS Press.
Witten, I. et. al. (2016). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.