Search or add a thesis

Advanced Search (Beta)
Home > Authorship Attribution for Urdu Newspapers Columns Using Text Mining Techniques

Authorship Attribution for Urdu Newspapers Columns Using Text Mining Techniques

Thesis Info

Access Option

External Link

Author

Waheed Anwar

Program

PhD

Institute

COMSATS University Islamabad

City

Islamabad

Province

Islamabad.

Country

Pakistan

Thesis Completing Year

2019

Thesis Completion Status

Completed

Subject

Computer Science

Language

English

Link

http://prr.hec.gov.pk/jspui/bitstream/123456789/12355/1/Waheed%20Anwar%20Computer%20Sci%202019%20iub%20prr.pdf

Added

2021-02-17 19:49:13

Modified

2024-03-24 20:25:49

ARI ID

1676727707210

Similar


With emergence of big data analytics in last decade, the importance of analyzing semistructured and unstructured data (such as text) is also highlighted. Since, the text (such as customer reviews, newspaper articles, etc.) contain significant business information, the text analytics becomes more significant to predict, infer or analyse information to add value to the business. In this research, we present a unified approach for intelligent association analysis of text that how much a piece of text is related to a customer or a person In this dissertation, an approach is presented for Authorship attribution in Urdu text using LDA model with n-grams texts of authors and improved sqrt-cosine similarity for the sake of forensic analysis. The proposed approach uses n-grams words to identify various learned representations of stylometric features and use them to identify the writing style of a particular author. The LDA based approach emphasizes instance-based and profile-based classification of an author’s text. Here, LDA suitably handles high dimensional and sparse data by allowing more expressive representation of text. The presented approach is an unsupervised computational methodology that can handle the heterogeneity of the dataset, diversity in writing styles of authors, and the inherent ambiguity of the Urdu language. A large corpus has been collected for performance testing of the presented approach. The results of experiments show superiority of the proposed approach over the state-of-the-art representations and other algorithms used for Authorship attribution. Manifold contributions of the presented work are use of improved sqrt-cosine similarity with LDA topics to measure similarity in vectors of text documents for the forensic analysis purpose, construction of a large data set of 6000 documents of columns, and achievement of 92% results on Urdu columns with fifteen authors and 78.57% results on PAN12 English dataset with fourteen authors without using any labels for authorship attribution task.
Loading...

Similar Thesis

Showing 1 to 20 of 100 entries
TitleAuthorSupervisorDegreeInstitute
PhD
COMSATS University Islamabad, Islamabad, Pakistan
PhD
International Islamic University, Islamabad, Pakistan
MSc
Quaid-i-Azam University, Islamabad, Pakistan
MS
International Islamic University, Islamabad, Pakistan
PhD
International Islamic University, Islamabad, Pakistan
MS
COMSATS University Islamabad, Islamabad, Pakistan
PhD
Preston University, Kohat, Pakistan
BS
International Islamic University, Islamabad, Pakistan
Mphil
Quaid-i-Azam University, Islamabad, Pakistan
MS
COMSATS University Islamabad, Islamabad, Pakistan
MS
University of Management and Technology, Lahore, Pakistan
PhD
University of Engineering and Technology, Lahore, Pakistan
BSc
University of Management and Technology, Lahore, Pakistan
BSc
University of Management and Technology, Lahore, Pakistan
Virtual University of Pakistan, Lahore, Pakistan
Mphil
Quaid-i-Azam University, Islamabad, Pakistan
PhD
The Islamia University of Bahawalpur, Bahawalpur, Pakistan
MS
COMSATS University Islamabad, Islamabad, Pakistan
Riphah International University, Faisalabad, Pakistan
MS
University of Management and Technology, Lahore, Pakistan
TitleAuthorSupervisorDegreeInstitute
Showing 1 to 20 of 100 entries

Similar News

Loading...

Similar Articles

Loading...

Similar Article Headings

Loading...