Home
Add
Get on Google Play
Home
> Edit
Add/Update Thesis
Title*
Author's Name*
Supervisor's Name
Abstract
With emergence of big data analytics in last decade, the importance of analyzing semistructured and unstructured data (such as text) is also highlighted. Since, the text (such as customer reviews, newspaper articles, etc.) contain significant business information, the text analytics becomes more significant to predict, infer or analyse information to add value to the business. In this research, we present a unified approach for intelligent association analysis of text that how much a piece of text is related to a customer or a person In this dissertation, an approach is presented for Authorship attribution in Urdu text using LDA model with n-grams texts of authors and improved sqrt-cosine similarity for the sake of forensic analysis. The proposed approach uses n-grams words to identify various learned representations of stylometric features and use them to identify the writing style of a particular author. The LDA based approach emphasizes instance-based and profile-based classification of an author’s text. Here, LDA suitably handles high dimensional and sparse data by allowing more expressive representation of text. The presented approach is an unsupervised computational methodology that can handle the heterogeneity of the dataset, diversity in writing styles of authors, and the inherent ambiguity of the Urdu language. A large corpus has been collected for performance testing of the presented approach. The results of experiments show superiority of the proposed approach over the state-of-the-art representations and other algorithms used for Authorship attribution. Manifold contributions of the presented work are use of improved sqrt-cosine similarity with LDA topics to measure similarity in vectors of text documents for the forensic analysis purpose, construction of a large data set of 6000 documents of columns, and achievement of 92% results on Urdu columns with fifteen authors and 78.57% results on PAN12 English dataset with fourteen authors without using any labels for authorship attribution task.
Subject/Specialization
Language
Program
Faculty/Department's Name
Institute Name
Univeristy Type
Public
Private
Campus (if any)
Institute Affiliation Inforamtion (if any)
City where institute is located
Province
Country
Degree Starting Year
Degree Completion Year
Year of Viva Voce Exam
Thesis Completion Year
Thesis Status
Completed
Incomplete
Number of Pages
Urdu Keywords
English Keywords
Link
Select Category
Religious Studies
Social Sciences & Humanities
Science
Technology
Any other inforamtion you want to share such as Table of Contents, Conclusion.
Your email address*