Loss Function: Definition, Types

What is a Loss Function?

loss function graph — A plot of loss functions [1].

A loss function is a way to measure how far off a prediction or decision is from the true value or outcome. Its purpose is to give us a way to optimize model parameters to improve accuracy. Different types of loss functions exist to handle specific tasks such as classification and regression, or quantifying absolute differences between actual and predicted values.

One way to think of a loss function is as a “scorecard” to helps you to quantify how good or bad a model is:

Lower loss values = better model performance,
Higher values = worse model performance.

In real life, the loss function will assign a numerical value to a prediction’s error, giving us a precise way to evaluate a model’s performance. The best model will be the one with the lowest loss function value. Predictive accuracy can be improved by minimizing the loss function.

An analogy: Imagine you’re throwing darts at a dartboard. The bullseye is the “true value,” and the distance of your darts from the bullseye represents the “error.” The loss function would assign a numerical value to these distances so you can figure out how well you did and adjust your aim for the next throw. The goal is to improve your aim and eventually hit the true value (the bullseye).

Common Types of Loss Functions

There are several different types of loss functions, each suited to specific kinds of approaches or problems:

Mean Squared Error (MSE): Commonly used in regression analysis, the MSE measures average squared differences between predicted and actual values. The differences are squared so that larger errors have heavier penalties.
Absolute Error (L1 Loss): L1 loss measures the absolute differences between predicted and actual values. As these differences aren’t squared, L1 loss is less sensitive to large errors (outliers) compared to MSE.
Cross-Entropy Loss: Used when output is a class label (such as spam or not spam) or probability distribution. Cross-Entropy Loss measures the difference between two probability distributions: one for the model’s predicted probabilities and another for true outcomes )often represented as a one-hot encoded vector). Penalties are applied for overconfidence. For example, if you predict an email is spam with 90% certainty but it’s not spam, cross-entropy penalizes that overconfidence.
Hinge Loss: used for tasks such as support vector machines (SVM) where the focus is on classification boundaries. This type of loss function will penalize incorrect classifications and those that are correct but too close to the boundary.
Huber Loss: A combination of MSE and MAE. It is quadratic for small errors and linear for large errors, which makes it less sensitive to outliers than MSE but more so than MAE. Use when you want robustness to outliers.
Log-Cosh Loss: The logarithm of the hyperbolic cosine of the prediction error. It is similar to MSE but is less sensitive to large errors. Suitable for regression problems where you want a smooth loss function that is robust to outliers.
KL Divergence: Measures the difference between two probability distributions, often between the true distribution and the predicted distribution. Commonly used in machine learning models that output probability distributions, such as variational autoencoders.
Poisson Loss: Used for modeling count data where the target variable is assumed to follow a Poisson distribution. Good in regression analysis involving count data, such as the number of events occurring in a fixed interval.

References

Mathew Mithra Noel, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons