Measuring text similarity between words/terms, phrases, sentences and documents is a routine task in Information Retrieval Systems and Natural Language Processing. Text similarity measures can either determine lexical or semantic similarity between the texts. In this study, a survey of various techniques employed to measure both lexical and semantic similarity is presented. Specifically, corpus based semantic measures that are based on corpus statistics and knowledge-based semantic measures derived from specific taxonomies are described along with the shortcomings and merits. Furthermore, widely used text representation models and word embeddings are presented along with the similarity measures.
Volume 12 | 04-Special Issue
Pages: 1922-1935
DOI: 10.5373/JARDCS/V12SP4/20201681