An Enhanced Artificial Neural Network based Optical Character Recognition Mechanism for Business Information Extraction and Classification

N. Sharmili and N. Swapna

Automated text processing and information extraction have gained attention of many researchers as it plays a vital role in the context of business information processing and extraction. With the advent of various artificial intelligence technologies using neural networks which can certainly overcome this problem. This paper proposes a neural networks and natural language processing based approach for hand written and optical character recognition. The proposed methodology is based on a combination of Optical Character Recognition (OCR) and a Named Entity Recognition (NER) model for classification. The OCR produces text for a given image (business-card) with is further classified by a well trained NLP-NER model to extract names and other details such as emails phone numbers, websites. Furthermore, the obtained results indicate that the proposed method provides high efficiency of text classification inspite of unstructured text and lack of sentence formation in text extracted from business cards. The results obtained were further improved by the Scikit-Learn Classifier and achieved 97.5% accuracy on a significantly large dataset

Volume 11 | 10-Special Issue

Pages: 13-19

DOI: 10.5373/JARDCS/V11SP10/20192772