This paper mainly focuses on document categorization of 20 newsgroup actual data sets and applies a latentDirichlet allocation model. The main motto is to build document clusters, which means that similar documents form one cluster and dissimilar documents form other clusters automatically. In this paper, we proposed a Latent DirichletAllocation for document categorization to form clusters automatically without human intervention. Initially, we need to collect different documents and merge all those documents and then apply a Latent DirichletAllocation model (LDA). LDA is a generative probabilistic model that assumes each topic is a mixture over an underlying set of words, and each document is a mixture of over a set of topic probabilities. Finally, we get documents clusters without mentioning the cluster value. The experiment results show that documents clustering using latent Dirichlet allocation can improve the system for the automatic categorization of document clusters.
Volume 12 | Issue 6
Pages: 1450-1458
DOI: 10.5373/JARDCS/V12I2/S20201342