Search or add a thesis

Advanced Search (Beta)
Home > Spectral Feature Extraction With Adaptive Mel Filter Bank for Speech Recognition

Spectral Feature Extraction With Adaptive Mel Filter Bank for Speech Recognition

Thesis Info

Access Option

External Link

Author

Nisar, Shibli

Program

PhD

Institute

National University of Computer and Emerging Sciences

City

Islamabad

Province

Islamabad

Country

Pakistan

Thesis Completing Year

2018

Thesis Completion Status

Completed

Subject

Electrical Engineering

Language

English

Link

http://prr.hec.gov.pk/jspui/bitstream/123456789/12630/1/Shibli%20Nisar%20FAST%20NU.pdf

Added

2021-02-17 19:49:13

Modified

2024-03-24 20:25:49

ARI ID

1676727838790

Similar


Speech Recognition is an active area under research from the last few decades. A number of sophisticated methods have been developed in recent years for improving recognition rate. A speech recognition system consists of two main components, i.e., frontend and back-end. In this thesis, we have introduced new methods to front-end which achieve higher recognition rate. For the front-end, we propose novel spectral features for speech recognition. More specifically this thesis replaces the traditional state of the art feature extraction technique i.e., mel frequency cepstral coefficients (MFCC) with adaptive mel filter bank, which is cognitively-inspired feature extraction approach that constitutes adaptive filter bank after sensing the spectrum of input signal. This work has not only improved the performance of automatic speech recognition system (ASR) but also contributed in three main directions of the ASR field. The first facet is related to improve the spectrogram visualization using adaptive window size selection. Short-time Fourier transform (STFT) is a well known technique, which is used for time-frequency analysis of non-stationary signal. Selection of an appropriate window size become a difficult task when no background information about the input signal is known. A novel empirical model is proposed in this work, which selects the window size adaptively for a narrow band signal using spectrum sensing technique. As fixed model is undesirable for a wide band signals, the proposed model adapts constant-Q transform (CQT). Unlike STFT, CQT provides a varying time frequency resolution. The proposed model not only improves the results of spectrogram visualization but also reduces the computational cost. Proposed model achieves 87.71% of the appropriate window length selection. The proposed model is not only useful in feature extraction from speech signal but it is also equally useful in biomedical signals, music signals and radio signals etc. The second facet relates commercial application of speech recognition. This thesis presents a novel idea that automatically identifies the hearing impairment based on a cognitively inspired feature extraction and speech recognition approach. To the best of authors’ knowledge, this is first attempt to automate pure tone and speech audiometry testing based on speech recognition. The proposed method uses an adaptive filter bank with weighted mel frequency cepstral coefficients for feature extraction. Classification is performed using well known statistical pattern technique i.e., hidden Markov model (HMM). The performance evaluation and comparison with the ground truth (expert audiologist results) and current state of the art techniques have revealed that the proposed method can achieve comparable results automatically. Specifically the overall absolute error of the proposed model when compared with expert audiologist result is less than 4.9 dB and 4.4 dB for pure tone and speech audiometry, respectively. The overall accuracy achieved by the proposed method is 96.67%. The third facet is related to the implementation of proposed feature extraction model for dialect recognition of low resource local language. Traditional methods for dialects recognition such as MFCC and discrete wavelet transform (DWT) work well for high resource languages but the accuracy is not that good for low resource languages. This thesis presents a new approach for Pashto dialects recognition using an adaptive filter bank with MFCC and DWT. This novel approach extracts features using adaptive filter bank in MFCC and DWT followed by classification using statistical pattern matching (HMM) and machine learning techniques K-nearest neighbors (KNN) and support vector machine (SVM) classifiers. Three different models proposed are tested and compared with state of the art techniques. The proposed method achieved an overall accuracy of 88%.
Loading...

Similar Thesis

Showing 1 to 20 of 100 entries
TitleAuthorSupervisorDegreeInstitute
PhD
National University of Computer and Emerging Sciences, Islamabad, Pakistan
BS
International Islamic University, Islamabad, Pakistan
BCE
COMSATS University Islamabad, Islamabad, Pakistan
University of Engineering and Technology, Lahore, Pakistan
Mphil
Quaid-i-Azam University, Islamabad, Pakistan
BCS
International Islamic University, Islamabad, Pakistan
MSc
Quaid-i-Azam University, Islamabad, Pakistan
MS
National University of Sciences & Technology, Islamabad, Pakistan
PhD
NED University of Engineering & Technology, Karachi, Pakistan
MS
International Islamic University, Islamabad, Pakistan
University of Management and Technology, Lahore, Pakistan
MSc
Quaid-i-Azam University, Islamabad, Pakistan
MSc
Quaid-i-Azam University, Islamabad, Pakistan
University of Engineering and Technology, Lahore, Pakistan
University of Engineering and Technology, Lahore, Pakistan
MCS
University of Management and Technology, Lahore, Pakistan
BEL
COMSATS University Islamabad, Islamabad, Pakistan
Mehran University of Engineering and Technology, Jamshoro, Pakistan
BS
International Islamic University, Islamabad, Pakistan
MSc
International Islamic University, Islamabad, Pakistan
TitleAuthorSupervisorDegreeInstitute
Showing 1 to 20 of 100 entries

Similar News

Loading...

Similar Articles

Loading...

Similar Article Headings

Loading...