Statistics How To

Nearest Neighbor Matching: Definition

Statistics Definitions >

Nearest neighbor matching is a solution to a matching problem that involves pairing a given point with another, ‘closest’ point. It is important in many very different fields, from data compression to DNA sequencing.

Searching for a Nearest Neighbor

Nearest neighbor matching can be carried out on most statistics software through a simple command. It can use either a “greedy” algorithm, which goes through the potential matches and selects the closest unmatched option to match each time, or a more complicated, more sophisticated “optimal matching” which, through some involved calculations, minimizes global balance over all matches.

Sometimes nearest neighbor matching is also run with replacement, where each member of the target set can be a match for more than one data point. With sampling without replacement, each member of the target set can only be used once.

Setting up a nearest neighbor analysis involves choosing the criteria for ‘closeness’—this could be a list of properties, the value of one particular property, or a propensity score—as well as a definition of ‘distance’ as it relates to the given property.

After closeness is defined and an algorithm is chosen, the matching is run. Then the matches need to be assessed; depending on your results, you may need to change your criteria for closeness or choice of algorithm and run the procedure again.

Tolerance Levels in Nearest Neighbor Matching

Since the nearest neighbor algorithm simply gives the ‘nearest’ neighbor, one can end up with a very bad match if the nearest neighbor is far away. Where this matters, we set ‘tolerance levels‘ (i.e. upper limits) to determine how far our matching algorithm should go in search of the nearest neighbor.

References

Caliendo & Kopeinig. Some Practical Guidance for the Implementation of Propensity Score Matching. IZA Discussion Paper No. 1588. May 2005. Retrieved from http://ftp.iza.org/dp1588.pdf on April 13, 2018

Stuart, Elizabeth. Matching methods for causal inference: Designing observational studies. For Best Practices in Quantitative Methods. Retrieved from http://www.biostat.jhsph.edu/~estuart/StuRub_MatchingChapter_07.pdf on April 13, 2018

Stuart, Elizabeth. Matching methods for causal inference: A review and a look forward. November 2009. Retrieved from http://www.ics.uci.edu/~sternh/courses/265/stuart_matching.pdf on April 13, 2018

------------------------------------------------------------------------------

Need help with a homework or test question? Chegg offers 30 minutes of free tutoring, so you can try them out before committing to a subscription. Click here for more details.

If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.

Comments? Need to post a correction? Please post on our Facebook page.
Nearest Neighbor Matching: Definition was last modified: June 13th, 2018 by Stephanie