A Performance Study of Naïve Bayes Classifier on High Dimensional Dataset

Priya Mohan and P. Ilango

Data mining is a process of fetching useful patterns using historical data. It is widely used in various real life applications namely search engines, fraud detection, speech recognition, healthcare, etc. Machine learning algorithms are used in data mining to predict the future event based on the patterns generated using historical data. All the features acquired during data collection may not be highly relevant to the target class of the pattern. Feature selection is a process which selects the best subset of features in dataset to enhance the performance of a data mining or machine learning algorithm. In this paper, an empirical study is conducted on Naïve Bayesian classifier using Pima Indian Type II Diabetes dataset with all the features and also the subset of the features selected by predefined python libraries. The performance of Naïve Bayesian classifier is evaluated on each of the future subset of the dataset to study the impact of the high dimensionality on the performance of Naïve Bayes Classifier.

Volume 11 | 04-Special Issue

Pages: 1330-1338