Enactment of Phonetic Algorithms on Twitter Dataset

Monika Arora, Vineet Kansal and Abhishek Swaroop

With the advent of technology, Social media has become the significant mode of communication. Micro-blogs, Tweets, Comments, Reviews, discussion forums are popularly used weblogs where people express themselves freely. They share their ideas or opinions about the product and (or) services that they are availing. It is a hub of unstructured data which contains useful information for data analyst or for any organization to assess their product popularity. However, these weblogs are usually written in an informal style consisting of abbreviations, misspelled words or homophones, which makes the text noisy and in turn makes it difficult to analyze. Thus, normalization of such text is required during the pre-processing step. This text contains homophones that means these words are phonetically similar to an original word but with different spellings which has attracted the researchers to use phonetic algorithms to normalize such text. In present exposition, we have analyzed some popularly known phonetic algorithms and compared their performance on the basis of their capability to normalize these mirror sound words to original words on Twitter data. We have compared the correctly labelled word count before and after applying the phonetic algorithms. The performance evaluation of phonetic algorithms on web data provides researchers to select an appropriate algorithm according to their needs.

Volume 11 | 04-Special Issue

Pages: 224-235