Analysis of Feature Extraction Methods for Text Clustering with Traditional and Deep Learning Approaches

D. Hemavathi, H. Srimathi and Simbarashe Herbert Chaputsira

Selection of best features from the text data is an important task in information retrieval process and also in text mining. The words which are considered as an important feature vary based on different methods used like Principal Component Analysis, lexicon features, latent semantic analytics and auto encoding process. Pre-processing of text plays a vital role before the processing of feature selection. Efficient feature extraction can be performed only in pre-processed text data. Various pre-processing methods are effectively used in real time text data like removal of stop words, stemming and lemmatization. Lexicon feature extraction, latent semantic mapping and concept indexing clustering methods are some of the well-known existing techniques in feature extraction of text data. Best features selected from the large amount of text data using deep learning approach provide better accuracy than the traditional approaches. We proposed the comparative analysis of traditional feature extraction methods and deep learning approaches for selecting the best feature from the huge amount of text data like tweets to categorize the tweets according to the domain.

Volume 11 | 04-Special Issue

Pages: 32-40