Performance Optimization of Multi-level Parallel Fuzzy C-means Algorithm Based on Multi-machine Clustering Technique for Heterogeneous Data

*Manisha B. Kumbhar, Rajesh S. Prasad

To alleviate scalability and performance issue in clustering algorithm is an emerging need in today’s era. This is due to increase in the large size and complexity of data being produced. In spite of this, to combine information from various sources into a one data format and processes it becomes most challenging task in data mining. Most of the clustering algorithm needs more computations to processes data. Since fuzzy clustering algorithm (FCM) have shown better clustering approach in terms of quality but suffer from high computations as data grows. To overcome these issues, this paper investigates PFCM called parallel Fuzzy C means clustering algorithm based on the concept of scalable multi-machine clustering technique (MMCT) for an optimal solutions. This algorithm is parallelized by using MMCT and client-server model. The performance of two parallel clustering algorithms FCM-Fork-join and PFCM are compared using metrics like speed up and scale up. Furthermore, a scalability analysis is conducted to demonstrate the performance of PFCM with increasing number of machines and various datasets with different sizes. The accuracy of the both FCM-fork-join and PFCM are measured with partition coefficient and an exponential (PCAES) validity index. It has been observed that speed up of proposed PFCM significantly increases as compared with FCM-Fork-join.

Volume 11 | 02-Special Issue

Pages: 1533-1550