An Evolutionary Architecture for High Dimensional Data Optimization to Remove Data Redundancy

Kamaljeet Kaur,Atul Garg

Big data has always been an attention gainer for research workers and scientists due to its complex storage architecture and data management issues. This paper considers the problem of raw data management where any reference cluster is not present for the data files. The proposed algorithm introduces a new relation finder combining Cosine relation and Dice Similarity index. A new clustering approach is designed and then crossvalidated utilizing Feed Forward Back Propagation Neural Network that also helps in optimizing redundancy. The cross-validation architecture is verified by Mean Square Error. A labelled dataset from Kaggle is utilized for the evaluation of proposed work. The dataset contains a maximum of 7 attributes and hence attribute precision , attribute recall and attribute f-measure is calculated at the end for the proposed algorithm and the comparative analysis has been shown to depict the effectiveness of the proposed work with Y. Djenouri et al. and an enhancement of 22.10% has been noticed in proposed work.

Volume 11 | 05-Special Issue

Pages: 2057-2069