Archives

A Novel Approach for Handwritten Tamil Character Recognition System


S. Anbukkarasi and S. Varadhaganapathy
Abstract

Optical Character Recognition (OCR) is the technology in which printed or written text character are understood or identified by the system. This process identifies the characters on the scanned image. The various languages like German, English and French which contain isolated character has character recognition system. Though the language Tamil is rich in grammar and literature which is also agglutinative in nature, there are very little work has been carried out in OCR and ICR (Intelligent Character Recognition) research area. In this paper, the initial steps to recognize Tamil hand written characters are carried out. The isolated Tamil characters of various subjects have been collected in unconstructed form and recognized using flat bed scanner. The binary conversion technique is used to preprocess the scanned documents by removing the noise and with the help of horizontal profile technique the lines were segmented. Each character in the scanned page is extracted by the segmented lines. This character segmentation technique is handled by the ideology called vertical projection system. The features can be extracted and characters are classified from the extracted characters. For the classification and feature extraction Zoning and Neural network have been used respectively. Finally the recognized characters are converted into editable text with an accuracy of 87%

Volume 12 | 03-Special Issue

Pages: 1489-1495

DOI: 10.5373/JARDCS/V12SP3/20201401