A Study on Text Similarity Measures

Dr.J. Ujwala Rekha and Dr.K. Shahu Chatrapati

Measuring text similarity between words/terms, phrases, sentences and documents is a routine task in Information Retrieval Systems and Natural Language Processing. Text similarity measures can either determine lexical or semantic similarity between the texts. In this study, a survey of various techniques employed to measure both lexical and semantic similarity is presented. Specifically, corpus based semantic measures that are based on corpus statistics and knowledge-based semantic measures derived from specific taxonomies are described along with the shortcomings and merits. Furthermore, widely used text representation models and word embeddings are presented along with the similarity measures.

Volume 12 | 04-Special Issue

Pages: 1922-1935

DOI: 10.5373/JARDCS/V12SP4/20201681