Asian Research Thesis Index

Abstract

Speech Recognition is an active area under research from the last few decades. A number of sophisticated methods have been developed in recent years for improving recognition rate. A speech recognition system consists of two main components, i.e., frontend and back-end. In this thesis, we have introduced new methods to front-end which achieve higher recognition rate. For the front-end, we propose novel spectral features for speech recognition. More specifically this thesis replaces the traditional state of the art feature extraction technique i.e., mel frequency cepstral coefficients (MFCC) with adaptive mel filter bank, which is cognitively-inspired feature extraction approach that constitutes adaptive filter bank after sensing the spectrum of input signal. This work has not only improved the performance of automatic speech recognition system (ASR) but also contributed in three main directions of the ASR field. The first facet is related to improve the spectrogram visualization using adaptive window size selection. Short-time Fourier transform (STFT) is a well known technique, which is used for time-frequency analysis of non-stationary signal. Selection of an appropriate window size become a difficult task when no background information about the input signal is known. A novel empirical model is proposed in this work, which selects the window size adaptively for a narrow band signal using spectrum sensing technique. As fixed model is undesirable for a wide band signals, the proposed model adapts constant-Q transform (CQT). Unlike STFT, CQT provides a varying time frequency resolution. The proposed model not only improves the results of spectrogram visualization but also reduces the computational cost. Proposed model achieves 87.71% of the appropriate window length selection. The proposed model is not only useful in feature extraction from speech signal but it is also equally useful in biomedical signals, music signals and radio signals etc. The second facet relates commercial application of speech recognition. This thesis presents a novel idea that automatically identifies the hearing impairment based on a cognitively inspired feature extraction and speech recognition approach. To the best of authors’ knowledge, this is first attempt to automate pure tone and speech audiometry testing based on speech recognition. The proposed method uses an adaptive filter bank with weighted mel frequency cepstral coefficients for feature extraction. Classification is performed using well known statistical pattern technique i.e., hidden Markov model (HMM). The performance evaluation and comparison with the ground truth (expert audiologist results) and current state of the art techniques have revealed that the proposed method can achieve comparable results automatically. Specifically the overall absolute error of the proposed model when compared with expert audiologist result is less than 4.9 dB and 4.4 dB for pure tone and speech audiometry, respectively. The overall accuracy achieved by the proposed method is 96.67%. The third facet is related to the implementation of proposed feature extraction model for dialect recognition of low resource local language. Traditional methods for dialects recognition such as MFCC and discrete wavelet transform (DWT) work well for high resource languages but the accuracy is not that good for low resource languages. This thesis presents a new approach for Pashto dialects recognition using an adaptive filter bank with MFCC and DWT. This novel approach extracts features using adaptive filter bank in MFCC and DWT followed by classification using statistical pattern matching (HMM) and machine learning techniques K-nearest neighbors (KNN) and support vector machine (SVM) classifiers. Three different models proposed are tested and compared with state of the art techniques. The proposed method achieved an overall accuracy of 88%.

Add/Update Thesis