Nearest neighbor matching is a solution to a matching problem that involves pairing a given point with another, ‘closest’ point. It is important in many very different fields, from data compression to DNA sequencing.
Searching for a Nearest Neighbor
Nearest neighbor matching can be carried out on most statistics software through a simple command. It can use either a “greedy” algorithm, which goes through the potential matches and selects the closest unmatched option to match each time, or a more complicated, more sophisticated “optimal matching” which, through some involved calculations, minimizes global balance over all matches.
Sometimes nearest neighbor matching is also run with replacement, where each member of the target set can be a match for more than one data point. With sampling without replacement, each member of the target set can only be used once.
Setting up a nearest neighbor analysis involves choosing the criteria for ‘closeness’—this could be a list of properties, the value of one particular property, or a propensity score—as well as a definition of ‘distance’ as it relates to the given property.
After closeness is defined and an algorithm is chosen, the matching is run. Then the matches need to be assessed; depending on your results, you may need to change your criteria for closeness or choice of algorithm and run the procedure again.
Tolerance Levels in Nearest Neighbor Matching
Since the nearest neighbor algorithm simply gives the ‘nearest’ neighbor, one can end up with a very bad match if the nearest neighbor is far away. Where this matters, we set ‘tolerance levels‘ (i.e. upper limits) to determine how far our matching algorithm should go in search of the nearest neighbor.
Caliendo & Kopeinig. Some Practical Guidance for the Implementation of Propensity Score Matching. IZA Discussion Paper No. 1588. May 2005. Retrieved from http://ftp.iza.org/dp1588.pdf on April 13, 2018
Stuart, Elizabeth. Matching methods for causal inference: Designing observational studies. For Best Practices in Quantitative Methods. Retrieved from http://www.biostat.jhsph.edu/~estuart/StuRub_MatchingChapter_07.pdf on April 13, 2018
Stuart, Elizabeth. Matching methods for causal inference: A review and a look forward. November 2009. Retrieved from http://www.ics.uci.edu/~sternh/courses/265/stuart_matching.pdf on April 13, 2018------------------------------------------------------------------------------
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments? Need to post a correction? Please post on our Facebook page.