Latent Semantic Analysis: Simple Definition, Method

Statistics Definitions > Latent Semantic Analysis

What is Latent Semantic Analysis?

Latent Semantic Analysis (LSA) is a way to analyze how words and groups of words are used in texts. It is used to answer questions like:

  • What is the underlying meaning of the text?
  • What effect do words have on the meaning of passages?
  • How does the average meaning of words in a passage relate to the overall meaning of a passage?

Language (especially the English language) is complex, in part because words have multiple meanings. For example, the word “hot” can mean a variety of things including “near boiling,” “sexy,” or “priced to sell.” A lot depends on the context you’re using it in (i.e. the surrounding passage). “Hot” in one text might have a completely different meaning in another, so finding related words, passages, or entire texts is no easy task. LSA attempts to do this by mapping words to concepts like “temperature,” “sex,” or “business.” The words and the linked concepts are then compared to arrive at the real meaning of text.

Latent semantic analysis is also called latent semantic indexing (LSI).

Method

latent semantic analysis
A matrix where each element shows how often words appear in a text.
LSA uses an advanced matrix algebra method called Singular Value Decomposition (SVD) to factorize matrices . SVD is usually impractical to perform by hand for anything more than a small sample of text. In fact it really only became popular after the 1980s when computers came on the scene to handle the complex algorithms.


The basic method is:

  • The text is converted into matrices to represent passages. Each cell in the matrix contains the number of times a certain word appears in a certain passage.
  • The matrix is factorized so that that every passage is represented as a vector. The value for each vector is the sum of vectors representing its component words.
  • Dot products, cosines or similar metrics are used to represent similarities between words and passages.

The theory behind the algorithms used in SVD is beyond the scope of this article, but you can read more about it in this University of Victoria article.

References

Thomo, A. Latent Semantic Analysis (Tutorial). Retrieved May 28, 2020 from: https://www.engr.uvic.ca/~seng474/svd.pdf


Comments? Need to post a correction? Please Contact Us.