High dimensional data throw challenges such as empty space phenomenon and concentration of distances. The sparsity of data and dissimilarity of objects in high dimensional scenario makes the computational cost high. The sparsity causes difficulty in density based approaches as it leads to bad density estimates. Distances between data points become difficult to distinguish as dimensionality increases, that cause hitches in distance based approaches. Clustering the huge data is a significant operation in the field of data mining to group the features that look similar to each other. In this paper we propose a novel technique which is referred as Affine Subspace Clustering Based on Hubness (ASCBH) algorithm that incorporates the formation of subspace with hubness property to handle the challenges faced in the high dimensional settings. We use the feature weighting method to have the relevant subspace of the given data space while we apply the hubness property of high dimensional data to obtain the hub of this data space and to eliminate the outliers. The data points with highest hubness scores are identified that are used as the basis for clustering the data further. The experimental results show that proposed system earns enhanced performance in high dimensional settings. The performance studies show that the mean value of intra-cluster scattering is decreased and mean value of inter-cluster distance is increased to a significant level while cluster centre accuracy is optimized.
Volume 11 | 06-Special Issue
Pages: 861-869