Archives

Automatic Segmentation of Kannada Speech for Emotion Conversion


A. Geethashree and D.J. Ravi
Abstract

Segmentation of continuous speech into words or sub-words is important in many speech processing applications like speech recognition and synthesis. This paper discusses the issue related to automatic word or sub-word edge detection in continuous speech of Kannada language with and without noise. The segmentation algorithm is based on both time domain parameter (Short Time Energy, Intensity and Short Time Zero Crossing Rate) and frequency domain parameters (Spectral Centroid) with dynamic thresholding technique. After the parameters are haul out, a dynamic thresholding technique is used to detect the edges of word or sub-words depending on the analysis of the above mentioned parameters using Matlab. The proposed segmentation algorithm is used in Word Based Emotion Conversion Algorithm, which uses Gaussian Normalization Equation for predicting the pitch contour of target emotional speech. The Gaussian Normalization algorithm is used for sentences level conversion and word level conversion. The converted emotional speech is evaluated using subjective test. The test results show that the word level emotion conversion algorithm has increase in the MOS (Mean Opinion Score) and recognition rate when compared to sentence level. Kannada Emotional Speech (KES) database is used for analysis and evaluation of both segmentation and emotion conversion algorithm. The segmentation algorithm is tested in clean environment, noisy environment (data from KES database with additive Gaussian noise) and also on online sentences with fan noise. The results of segmentation shows the Average Segmentation Accuracy Rate (SAR) is 97.04% for clean speech and 89.5% for speech with additive noise.

Volume 11 | 07-Special Issue

Pages: 1588-1604