Web usage mining can be used to carefully analyze web log files collected in web servers for pattern discovery. Web mining mainly focuses on the preprocessing part and clustering part. Defining a cluster is difficult and that is the reason, we find different types of algorithms related to clustering. The common thing that binds all these algorithms together is the existence of group of files. Researches in data mining is done using various clustering models. Needless to say, each of these cluster models have different algorithms. Some of the popular models being used for cluster analysis are Centroid models, Connectivity models, Distribution models, Subspace models, Group models and Graph based models. We in our present research work have focused on the K-means algorithm which uses Cosine similarity measure, instead of Euclidean Distance measure which is commonly used by a conventional K-means. K-means groups all the similar data points together for discovering the behavioral patterns of browsing. In order to achieve grouping, K-means examines a fixed K number of clusters contained in a dataset. The K-means method is then trained to help identify and analyze user click stream data. The results of this research shows that K-means when used with cosine similarity measure is able to cluster and classify the web log data, which displayed exemplary performance improvement. K-means also improved the analysis of clusters which enhanced the process of prediction.
Volume 11 | 04-Special Issue
Pages: 1358-1363