Search or add a thesis

Advanced Search (Beta)
Home > Enhancing Accuracy of Urdu Sentiments Analysis, Using Lexicon-Based Approach

Enhancing Accuracy of Urdu Sentiments Analysis, Using Lexicon-Based Approach

Thesis Info

Access Option

External Link

Author

Chiragh, Neelam.

Program

PhD

Institute

University of Peshawar

City

Peshawar

Province

KPK

Country

Pakistan

Thesis Completing Year

2018

Thesis Completion Status

Completed

Subject

Social sciences

Language

English

Link

http://prr.hec.gov.pk/jspui/bitstream/123456789/9257/1/Neelam%20Chiragh_CS_2018_UoPeshawar.pdf

Added

2021-02-17 19:49:13

Modified

2024-03-24 20:25:49

ARI ID

1676724682133

Similar


In this research the accuracy of Urdu Sentiment Analysis in multiple domains is enhanced by using the Lexicon-based approach. In the lexicon, apart from the traditional approach that considers adjectives only, nouns and verbs are also included. An efficient Urdu Sentiment Analyzer is developed that applies rules and makes use of this new lexicon to perform Urdu Sentiment Analysis by classifying sentences as positive, negative or neutral. Negations, intensifiers and context-depentent words are effectively handled for enhancing accuracy of Urdu Sentiment Analyzer. Specific rules for handling negations, intensifiers and context-dependent words are incorporated in Urdu Sentiment Analyzer. For testing the Lexicon-based approach, a corpus of 6025 sentences from 151 blogs belonging to 14 different genres is collected and the sentences are annotated by three human annotators to classify each sentence as positive, negative and neutral. Evaluating this Urdu Sentiment Analyzer, by using sentences from the corpus, yields the most promising results so far in Urdu language (up to the knowledge of the author) with 89.03% accuracy, 0.86 precision, 0.90 recall and 0.88 f-measure. The comparison with the previous works in Urdu Sentiment Analysis shows that the combination of this Urdu Sentiment Lexicon and Urdu Sentiment Analyzer is much more effective than the previous such combinations. The main reason for increased efficiency is the development of wide coverage lexicon and effective handling of negations, intensifiers and context-dependent words by the Urdu Sentiment Analyzer. Although high accuracy is achieved by Lexicon-based approach in multiple domains for Urdu Sentiment Analysis, which is the main objective of this research, but for comparison, Supervised Machine Learning approach is also used. Three well known classifiers that are Support Vector Machine, Decision Tree and K Nearest Neighbor are tested; their outputs are compared and their results are ultimately improved in several iterations. It is further concluded that K Nearest Neighbor is performing better than Support Vector Machine and Decision Tree. For verification of this result, three evaluation measures i.e. McNemar’s Test, Kappa Statistic and Root Mean Squared Error are used. The result from all these three evaluation measures confirmed that K Nearest Neighbor is performing much better than the other two classifiers and achieved 67.02% accuracy, 0.68, 0.67 and 0.67 precision, recall and f-measure respectively. The results from both the approaches are compared. On the basis of experiments performed in this research, it is concluded that the Lexicon-based approach outperforms Supervised Machine Learning approach, when Urdu Sentiment Analysis is performed in multiple domains in terms of accuracy, precision, recall and f-measure, economy of time and effort.
Loading...
Loading...

Similar News

Loading...

Similar Articles

Loading...

Similar Article Headings

Loading...