Redefining Urdu Morphology and Grammar for the Development of an Integrated Sentiment Analysis Framework

Thesis Info

Access Option

External Link

Author

Syed, Afraz Zahra

Program

PhD

Institute

University of Engineering and Technology

City

Lahore

Province

Punjab

Country

Pakistan

Thesis Completing Year

2013

Thesis Completion Status

Completed

Subject

Computer Science

Language

English

Link

http://prr.hec.gov.pk/jspui/bitstream/123456789/2223/1/2773S.pdf

Added

2021-02-17 19:49:13

Modified

2024-03-24 20:25:49

ARI ID

1676727820602

Similar

The rise of social networking sites and blogs has simulated a bull market in personal opinion; consumer recommendations, product reviews, ratings, and other types of online expressions. For computational linguistic researchers, this fast-growing heap of information has opened an exciting research frontier, referred as, the Sentiment Analysis (SA). For English, this area is under consideration from last decade. But, other major languages, like Urdu, are totally overlooked by the research community. Urdu is a morphologically rich and recourse poor language. The distinctive features, like, complex morphology, flexible grammar rules, context sensitive orthography and free word order, make the Urdu language processing a challenging problem domain. For the same reasons, sentiment analysis approaches and techniques developed for other well-explored languages are not workable for Urdu text. This dissertation presents a grammatically motivated, sentiment classification framework to handle these distinctive features of the Urdu language. The main research contributions are; to highlight the linguistic (orthography, grammar and morphology, etc.) as well as technical (parsing algorithm, lexicon, corpus, etc.) aspects of this multidimensional research problem, to explore Urdu morphological operations, grammar and orthographic rules, to redefine these operations and rules with respect to the requirements of sentiment analysis framework. The orthographical, morphological, grammatical and finally the conceptual details of the language are our target concerns. Additionally, our approach can help in the sentiment analysis of other languages, like Arabic, Persian, Hindi, Punjabi etc. The proposed framework emphasizes on the identification of the SentiUnits, rather than, the subjective words in the given text. SentiUnits are the sentiment carrier expressions, which reveal the inherent sentiments of the sentence for a specific target. The targets are the noun phrases for which an opinion is made. The system extracts SentiUnits and the target expressions through the shallow parsing based chunking. The dependency parsing algorithm creates associations between these extracted expressions. The framework uses the sentiment-annotated lexicon based approach. Each entry of the lexicon is marked with its orientation (positive or negative) and the intensity (force of orientation) score. The experimentation based evaluation of the system with a sentiment-annotated lexicon of Urdu words and two corpuses of reviews as test-beds, shows encouraging achievement in terms of accuracy, precision, recall and f-measure.

Chapters

Title	Author	Supervisor	Degree	Institute
Title	Author	Supervisor	Degree	Institute

Similar Thesis

Title	Author	Supervisor	Degree	Institute
Title	Author	Supervisor	Degree	Institute

Similar News

Headline	Date	News Paper	Country
Headline	Date	News Paper	Country

Article Title	Authors	Journal	Vol Info	Language
Article Title	Authors	Journal	Vol Info	Language

Heading	Article Title	Authors	Journal	Vol Info
Heading	Article Title	Authors	Journal	Vol Info

Search or add a thesis