# Greedy Algorithm & Greedy Matching in Statistics

## What is a Greedy Algorithm?

The greedy algorithm is one of the simplest algorithms to implement: take the closest/nearest/most optimal option, and repeat. It always chooses which element of a set seems to be the best at the moment. It never changes its mind at a later point.

Greedy algorithms are “top-down”, which mean that the algorithm makes one greedy choice and then another, reducing large problems to smaller ones. The idea is that by choosing the tastiest (most optimal) element at any moment, the overall system will eventually be optimized. Most problems cannot be optimized by a greedy algorithm, but it does work for some cases (like greedy matching).

For an example of how a simple greedy algorithm works, see: Boruvka’s Algorithm (Sollin’s Algorithm), which is a way to find a minimum spanning tree where the sum of edge weights is minimized.

## Greedy Matching Algorithm

The goal of a greedy matching algorithm is to produce matched samples with balanced covariates (characteristics) across the treatment group and control group. It can generate one-to-one or one-to-many matched pairs sampled without replacement. Sampling without replacement means that once a person has been matched, they cannot be used in another match.

Greedy nearest neighbor is a version of the algorithm that works by choosing a treatment group member and then choosing a control group member that is the closest match. For example:

1. Choose the participant with the highest propensity score (a propensity score is the probability of being assigned to the treatment group).
2. Select a control group member with the closest propensity score to the person picked in Step 1.
3. Choose a second treatment group member (in this example, with the next highest propensity score rank), match the second participant.
4. Repeat the process until all participants are matched.

You don’t have to start with the participant with the highest rank propensity score. Other options are:

• Lowest to highest score.
• Best possible match: a pair is chosen with the closest scores. They are removed from the set, then another pair is chosen with the next closest scores, and so on.
• Random selection: pairs are chosen at random using a random seed.
• Caliper matching: a maximum caliper distance is set for the matches. A caliper distance is the absolute difference in propensity scores for the matches. As a maximum value is being set, this may result in some participants not being matched.