Search or add a thesis

Advanced Search (Beta)
Home > Generic Urdu Nlp Framework for Urdu Text Analysis: Hybridization of Heuristics and Machine Learning Techniques

Generic Urdu Nlp Framework for Urdu Text Analysis: Hybridization of Heuristics and Machine Learning Techniques

Thesis Info

Access Option

External Link

Author

Khan, Wahab

Supervisor

Ali Daud

Program

PhD

Institute

International Islamic University

City

Islamabad

Province

Islamabad.

Country

Pakistan

Thesis Completing Year

2019

Thesis Completion Status

Completed

Subject

Computer Science

Language

English

Link

http://prr.hec.gov.pk/jspui/bitstream/123456789/10445/1/Wahab%20Khan_CS_2019_IIU_Incomp.pdf

Added

2021-02-17 19:49:13

Modified

2024-03-24 20:25:49

ARI ID

1676727764317

Similar


The internet was initially designed to present information to users in English. However, with the passage of time and the development of standard web technologies such as browsers, programming languages, libraries, frameworks, databases, front and back-ends, protocols, APIs, and data formats, the internet became a multilingual source of information. In the last few years, the natural language processing (NLP) research community has observed a rapid growth in online multilingual contents. Thus, the NLP community maims to explore monolingual and cross-lingual information retrieval (IR) tasks. Digital online content in Urdu is also currently increasing at a rapid pace. Urdu, the national language of Pakistan and the most widely spoken and understandable language of Indian sub-continent, is considered a low-resources language (Mukund, Srihari, & Peterson, 2010). Part of speech (POS) tagging and named entity recognition (NER) are considered the most basic NLP tasks. Investigation of these two tasks in Urdu is very hard. POS tagging, the assignment of syntactic categories for words in running text is significant to natural language processing as a preliminary task in applications such as speech processing, information extraction, and others. Named entity recognition (NER) corresponds to the identification and classification of all proper nouns in texts, and predefined categories, such as persons, locations, organizations, expressions of times, quantities and monetary values, etc. it is considered as a sub-task and/or sub-problem in information extraction (IE) and machine translation. NER is one of the hardest task in Urdu language processing. Previously majority Urdu NER systems are based on machine learning (ML) models. However, the ML model needs sufficiently large annotated corpora for better performance(Das, Ganguly, & Garain, 2017). Urdu is termed as a scared resource language in which sufficiently large annotated corpus for ML models’ evaluation is not available. Therefore, the adoption of semi-supervised approach which is largely dependent on usage of the huge amount of unlabeled data is a feasible solution. In this thesis, we propose a generic Urdu NLP framework for Urdu text analysis based on machine learning (ML) and deep learning approaches. Initially, we addressed POS challenges by developing a novel tagging approach using the linear-chain conditional random fields (CRF). We employed a strong, stable, balanced language-independent and language dependent feature set for Urdu POS task and used the method of context words window. Our approach was evaluated against a support vector machine (SVM) technique for Urdu POS - considered Abstract WAHAB KHAN Reg: No. 72-FBAS/PHDCS/S12 vi as the state of the art - on two benchmark datasets. The results show our CRF approach to improving upon the F-measure of prior attempts by 8.3 to 8.5%. Secondly, we adopted deep recurrent neural network (DRNN) learning algorithms with various model structures and word embedding as a feature for the task of Urdu named entity recognition and classification. These DRNN models include long short-term memory (LSTM) forward recurrent neural network (RNN), LSTM bi-directional RNN, backpropagation through time (BPTT) forward RNN and BPTT bi-directional RNN. We consider language-dependent features such as part of speech (POS) tags as well as language independent features such as N-grams. Our results show that the proposed DRNN-based approach outperforms existing work that employ CRF based approaches. Our work is the first to use DRNN architecture and word embedding as a feature for Urdu NER task and improves upon prior attempts by 9.5% in the case of maximum margin.
Loading...
Loading...

Similar Books

Loading...

Similar Chapters

Loading...

Similar News

Loading...

Similar Articles

Loading...

Similar Article Headings

Loading...

اُن کی سوچوں کا سفر شہرِ مدینہ کی طرف


اُن کی سوچوں کا سفر شہرِ مدینہ کی طرف
جن کے خوابوں کا نگر شہرِ مدینہ کی طرف

آنے والوں کے دل و جان وہیں رہ جائیں
جانے والوں کی نظر شہرِ مدینہ کی طرف

قبلۂ دنیا و دیںؐ ، شمعِ حرم گاہِ مبیںؐ
رہبرِ جنّ و بشرؐ ، شہرِ مدینہ کی طرف

ایسی تسکین کسی گوشۂ دنیا میں کہاں
خُلد کا کُھلتا ہے در شہرِ مدینہ کی طرف

ذرّے ذرّے میں وہاں طور نظر آتا ہے
قطرہ قطرہ ہے گُہر شہرِ مدینہ کی طرف

سوز جامیؔ کی جھلک لفظوں میں آئے تو کہوں
’’سطرِ مدحت کا سفر شہرِ مدینہ کی طرف‘‘

کام آتا ہے فقط جذبۂ صادق عرفانؔ
لے کے جاتا نہیں زر ؛ شہرِ مدینہ کی طرف

شیرمادر بینک (Mother’s Milk Bank) کا شرعی حکم

The emergence of human milk banks for premature and underweight babies in the early twentieth century raised many questions about the proscription of breastfeeding kinship as are in Islamic jurisprudence. Many Islamic scholars tries to find its solution in the light of Quran, Sunnah and the sayings of early Imams of Fiqh, but their opinion about this matter was different like the differences of sayings of some Imams, until Islamic Organization for Medical Sciences based in Kuwait and Islamic fiqh Academy Jeddah, called Summits on this issue and decided against the establishment of such banks in Islamic world. The issue seemed to be almost solved until European Council for Fatwa and Research launched an appeal in 2003 against their solution and demanded to legitimate the use of Human Milk from these banks for the children of Muslim families in Europe and USA using the public scourge canon (Umum al Balwa) of fiqh. This appeal once again opened the door of discussion on this matter. This article is an overview of the sayings of early and modern jurisprudents and pros and cons of human milk banks in the quest of solution of this modern problem in the light of Islamic shariah, so that a just and balanced opinion may be adopted in this matter as it is motto of Islamic Law. This discussion will also affect many new problems faced by Muslim communities in European countries as well as Islamic countries in modern era

Spectrum Decision Support Framework for Cognitive Radio Networks

The exponential increase of mobile devices and the wide availability of bandwidth-hungry applications have created an eruption in mobile data traffic. Such extraordinary evolution in wireless data usage cause a severe capacity shortage in wireless mobile networks and presents substantial challenges to cellular operators and telecommunication regulatory au thorities. Operators consider various technologies to improve their infrastructure, such as upgrading their entire network to LTE, taking advantage of existing available spectrum, or leveraging new spectrum opportunities such as the newly vacated TV band. However, such networkdesignsdonotfacilitaterobustnessinspectrumusage. CognitiveRadioNetworkof fersacapablesolutionforassuagingthisproblem. Inmobilenetworks,thewirelessspectrum bands are also used by the secondary users in the absence of the licensed users. Spectrum decision is to be performed by secondary users while catering for the inconsistent behavior of fluctuating nature of spectrum slots and diverse service requirements of various wireless applications, secondary users have to adopt, aiming at optimizing the transmission perfor mance of SUs. A SU has to sense multiple target spectrum slots in the shortest possible time before deciding to select and occupy the most suitable to its QoS requirements idle slot for its transmission. Spectrum decision process selects the most suited slot from these availableslotsforopportunisticusebysecondaryusers. AsupportframeworkforCRNshas been proposed, which is called Spectrum decision Support Framework (SDSF). SDSF of fers an intelligent spectrum decision scheme that first senses the idle slots and then enables SUs to swiftly occupy them effectively. SDSF integrates various spectrum decision tech niquesandtakesintoaccountvariousspectrumslotcharacterizationparameters. Ascientific support framework has been developed for SUs in the CRN which includes spectrum slot viz-a-viz SUs’ QoS requirements, simulation evaluation duly validated by practical imple mentation. In this thesis, the proposed SDSF not only enables SUs to occupy the discretely time and frequency slotted channels in the entire wireless spectrum encompassing the spec trum bands of IEEE802.22, GSM, CDMA, LTE, IEEE802.11, Bluetooth, UWB and 5G, but also guarantees QoS requirements of SUs as per wireless service applications and ensures no interference with PUs. Initially the SDSF comprise of three wireless spectrum slot pa rameters; spectrum slot idle time, measured with the history of PUs’ access, spectrum slot possessionbythePUsandthespectrumslotQoS.Thisschemewasvalidatedbytheachieved throughput of SUs at the end of its transmission. The achieved throughput leads to the log ical architectural design of 5G services providing flexibility required to support efficiently a heterogeneous set of wireless services including Internet of Things traffic. The proposed SDFS guaranteed QoS requirements for these applications in terms of end-to-end latency, SUs’mobilityandnointerferencewithPUsaswellaswithotherSUsofCRN.Anempirical SDSF for CRNs consisting of a signal generator, USRP2 and a network analyzer based on the sensing data achieved by a central SU from other (slave) SUs in the CRN has also been proposed. The results obtained validates that the proposed SDSF satisfies complementary receiver operating characteristics at various signal to noise ratio, end-to-end latency and the networkcongestion. Thesimulationresultsindicatethevalidityoftheproposedschemesfor spectrum decision for cognitive radio networks.