Search or add a thesis

Advanced Search (Beta)
Home > Text-Independent Speaker Verification System for Pashto Speakers With Accent and Dialect Recognition

Text-Independent Speaker Verification System for Pashto Speakers With Accent and Dialect Recognition

Thesis Info

Access Option

External Link

Author

Shah, Shahid Munir

Program

PhD

Institute

University of Sindh

City

Jamshoro

Province

Sindh

Country

Pakistan

Thesis Completing Year

2019

Thesis Completion Status

Completed

Subject

Computer Science

Language

English

Link

http://prr.hec.gov.pk/jspui/bitstream/123456789/12409/1/Shahid%20Munir%20Shah%20Information%20TEch%202019%20uni%20of%20sindh%20prr.pdf

Added

2021-02-17 19:49:13

Modified

2024-03-24 20:25:49

ARI ID

1676727847035

Similar


In this thesis, “a text-independent speaker verification system for Pashto speakers using accent and dialect recognition approach” has been designed. The purpose of the designed system is to recognize the region of origin of Pashto native speakers on the basis of their distinct dialects and to verify them using a speaker verification system. Due to the unavailability of the Pashto voice data in the form of different accents and dialects, a Pashto speakers’ database using different dialects of Pashto was developed. In order to develop the data initially, different dialectical variations of Pashto language were studied in detail and then the speech data was collected only from those different regions of Pakistan and Afghanistan where the Pashto is spoken with different dialects. After the database development, it is processed through front end and feature extraction processes where Mel Frequency Cepstral Coefficient (MFCC) features have been extracted from the collected data. After the MFCC feature vectors have been obtained, a Multilayer Perceptron (MLP) based classifier was designed to classify the speakers. Two separate classification experiments were performed (1) Speaker identification followed by dialect identification (2) Text-independent speaker verification followed by dialect identification. Speaker identification followed by dialect identification achieved 96.0 % identification accuracy, whereas, speaker verification followed by dialect identification achieved 100 % verification accuracy. Furthermore, the proposed Gaussian Mixture Model (GMM) based dialect identification system achieved 93.8 % identification accuracy in identifying Pashto native dialects. In order to inspect the noise robustness of the proposed system, the system’s performance was checked with the different degrees of noise level and Signal to Noise Ratio (SNR) was computed for each degree of noise. The performance of the system showed slightly degradation with the increase in the noise level, hence, showed its robustness against noise. A simple Pashto digits recognition (1 to 10 digits of Pashto) was also included in the study using MLP, HMM & SVM classifiers. Comparative analysis showed that the SVM based Pashto digit recognizer with 98.5 % recognition accuracy outperformed both the MLP and HMM based Pashto digit recognizers by showing 1.3 % and 3.3 % improvement in recognition accuracy. In order to benchmark the proposed research, the system’s performance was further tested on classifying some foreign accent of Pashto (Urdu accent of Pashto). In case of classifying the Urdu accent of Pashto, the system achieved 74.4 % recognition accuracy. Finally, the results achieved in the conducted experiments were compared with the recently proposed state of the art dialect identification, speaker verification and Pashto digit recognition systems. Comparative study showed that the proposed system outperformed some recently proposed dialect identification as well as speaker verification systems and showed relative improvement in recognition accuracies.
Loading...
Loading...

Similar Books

Loading...

Similar Chapters

Loading...

Similar News

Loading...

Similar Articles

Loading...

Similar Article Headings

Loading...

ترجمہ نگاری کے لغوی و اصطلاحی معنی

ترجمہ نگاری کے لغوی و اصطلاحی معنی
ترجمہ :
فیروزالغات کے مطابق:
"ایک زبان سے دوسری زبان میں بیان کیا ہوا"
انگریزی میں اس کے ہم پلہ لفظ Translationہے۔ترجمہ کے معنی پار لے جانا کے بھی ہیں۔
سوزن کے بقول:
" ترجمہ ایک متن کی بعد از موت دوسری زندگی کا ضامن ہوتا ہے اور دوسری زبان میں ایک نیا اصل بھی"

Pengaruh Lingkungan Belajar dan Sistem Pembelajaran Daring menggunakan Google Meet terhadap Motivasi Belajar Mahasiswa Akuntansi UIN Sultan Syarif Kasim Riau pada masa pandemi Covid 19

Pada masa pandemi Covid 19 telah banyak terjadi perubahan terkait lingkungan dan sistem pembelajaran pada mahasiswa, sehingga mempengaruhi motivasi belajar mereka. Tujuan penelitian ini adalah untuk mengetahui bagaimana pengaruh lingkungan belajar dan sistem pembelajaran daring menggunakan google meet terhadap motivasi belajar mahasiswa pada masa pandemi Covid 19. Metode dalam penelitian ini menggunakan metode survey dengan menguji hipotesa dan menggunakan data kuantitatif. Hasil dari penelitian ini adalah lingkungan belajar dan sistem pembelajaran daring menggunakan google meet berpegaruh positif terhadap motivasi belajar mahasiswa. Hal ini menunjukkan bahwa perubahan lingkungan dan sistem pembelajaran pada masa pandemi Covid 19 membuat mahasiswa menjadi mandiri dan memiliki pemikiran yang kritis.

Recognizing Human Actions in Realistic and Complex Scenarios Using Bag of Expression Boe Model

Human action recognition (HAR) has emerged as a core research domain for video understanding and analysis, thus attracting many researchers. Although signi cant results have been achieved in simple scenarios, HAR is still a challenging task due to issues associated with view independence, occlusion and inter-class variation observed in realistic scenarios. In previous research e orts, the classical Bag of Words (BoW) approach, along with its variations, has been widely used. In this dissertation, we propose a novel feature representation approach for action representation in complex and realistic scenarios. We also present an approach to handle the inter and intraclass variation challenge present in human action recognition. The primary focus of this research is to enhance the existing strengths of the BoW approach like view independence, scale invariance and occlusion handling. The proposed Bag of Expressions (BoE) includes an independent pair of neighbors for building expressions; therefore it is tolerant to occlusion and capable of handling view independence up to some extent in realistic scenarios. We apply a class-speci c visual words extraction approach for establishing a relationship between these extracted visual words in both space and time dimensions. To improve classical BoW, we propose a Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) model for human action recognition without compromising the strengths of the classical bag of visual words approach. Expressions are formed based on the density of a spatiotemporal cube of a visual word. To handle inter-class variation, we use class-speci c visual word representation for visual expressions generation. The formation of visual expressions is based on the density of spatiotemporal cube built around each visual word, as constructing neighborhoods with axed number of neighbors would include non-relevant information hence making a visual expression less discriminative in scenarios with occlusion and changing viewpoints. Thus, the proposed approach makes our model more robust to occlusion and changing viewpoint challenges present in realistic scenarios. Comprehensive experiments on publicly available datasets show that the proposed approach outperforms existing state-of-the-art human action recognition approaches.