Efficient Retrieval of HTML Documents using Hybrid PSO and Hybrid ACO in Web Document Clustering


With the rapid development of web documents on WWW, the organization, analysis and presentation of these documents is becoming increasingly difficult. Web search is usually done with features which are just extracted from the web page text. The tag information in HTML documents has been found to be useful for getting better performance of the information retrieval system.However, in the recent times, the volume of data on the World Wide Web is rapidly increasing day by day. It becomes a significant challenge for finding the required information on the net. This leads to the need for the development of the new approach that may aid users in navigating, summarizing and organizing the required information. One of the techniques that could be useful to achieve this goal is web document clustering. However, existing partition clustering techniques suffer from local optima problems. Various efforts have been made for addressing such drawbacks. This includes the utilization of various meta-heuristic approaches as well. In this research work, we provides a document clustering technique which uses HTML tags and PSO (Particle Swarm Optimization) and Ant Colony Optimization (ACO) approaches. The hybrid PSO+ K-means and ACO+ K-means algorithm are used to cluster the web documents. In the proposed approaches, results are analyzed on WEBKB dataset.

Volume 11 | 02-Special Issue

Pages: 1856-1865