A Study of Plagiarism Detection Using Natural Language Processing Technique

Thesis Info

Access Option

External Link

Author

Nasreen Malik

Institute

Virtual University of Pakistan

Institute Type

Public

City

Lahore

Province

Punjab

Country

Pakistan

Thesis Completing Year

2018

Thesis Completion Status

Completed

Subject

Software Engineering

Language

English

Link

http://vspace.vu.edu.pk/detail.aspx?id=157

Added

2021-02-17 19:49:13

Modified

2024-03-24 20:25:49

ARI ID

1676720981450

Join Our Whatsapp Channel

Asian Research Index Whatsapp Chanel

Join our Whatsapp Channel to get regular updates.

Similar

Now a day?s plagiarism became very common in many fields of life such as research and education. It is an illegal deed used to make others work as own property without any proper references. Plagiarism is defined as showing other?s work as your own or using/stealing other?s ideas without any permission. Due to advancement in plagiarism techniques adopted by plagiarist, it is very difficult to detect plagiarism accurately by existing techniques. Different features are observed to determine the presence of plagiarism in documents such as syntactic, lexical, semantic and structural features. Today lots of techniques are introduced to detect plagiarism i.e. string matching, a bag of words, fingerprinting, citation analysis and stylometry . Advance detectors mostly work with source code or natural language text. To detect similarity in natural language texts, detectors commonly explore the Internet. In text analysis, detectors use very easy and simple comparison procedures based on broad coverage and processing speed. This research explores new and modern plagiarism detection tasks especially text-based plagiarism detection includes monolingual plagiarism detection. The main idea behind this research is that rewritten and original text does not have similar text and differences among these documents can be explored with the help of linguistic and statistical indicators. To investigate above statement, the main research objectives are formulated as follow; a four stage novel framework for plagiarism detection is proposed. Natural Language Processing (NLP) is used by this framework instead of focusing on traditional string-matching approaches. The objective of this model is to use text pre-processing and statistical, shallow and deep linguistic techniques using a corpus-based approach. Proposed framework is tested by comparing its working theoretically with other techniques.

Chapters

Title	Author	Supervisor	Degree	Institute
Title	Author	Supervisor	Degree	Institute

Similar Thesis

Title	Author	Supervisor	Degree	Institute
Title	Author	Supervisor	Degree	Institute

Similar Books

Book	Author(s)	Year	Publisher
Book	Author(s)	Year	Publisher

Similar Chapters

Chapter	Author(s)	Book	Book Authors	Year	Publisher
Chapter	Author(s)	Book	Book Authors	Year	Publisher

Similar News

Headline	Date	News Paper	Country
Headline	Date	News Paper	Country

Article Title	Authors	Journal	Vol Info	Language
Article Title	Authors	Journal	Vol Info	Language

Heading	Article Title	Authors	Journal	Vol Info
Heading	Article Title	Authors	Journal	Vol Info

انکار بھی نہیں انھیں اقرار بھی نہیں

انکار بھی نہیں ، انھیں اقرار بھی نہیں
اتنا ہوا یہ دل کبھی لاچار بھی نہیں

دکھ درد ہم نے بانٹ لیے سارے شہر کے
آئی جو اپنی باری تو اک یار بھی نہیں

اس بخت کی یہ پستیاں بھی ہوں ملاحظہ
گل تو کجا ہے اپنے لیے خار بھی نہیں

اس طرح مفلسی کے ستائے ہوئے ہیں ہم
محرومیِ اناج ہے اور پیار بھی نہیں

جانے وہ کون لوگ تھے جن کو ملے ہیں یار
حاصل ازل سے ہم کو تو اغیار بھی نہیں

دولت خلوص کی ہے مرے پاس فہدؔ سب
اور وہ خلوص کا تو طلب گار بھی نہیں

Identification of Factors Contributing to Primary Female Subfertility by Diagnostic Hystero-Laparoscopy: An Experience of Private Hospital

Background: Management of subfertility is influenced by the diagnosis of its causative factor. Combined diagnostic hystero-laparoscopy has emerged as an effective procedure in identifying causative factors of female subfertility. Objectives: This study aimed to identify contributory factors to primary female subfertility by diagnostic hystero-laparoscopy. Methods: This descriptive study was conducted at the Department of Obstetrics and Gynecology of Hameed Latif hospital, Lahore, Pakistan from December 2021 to May 2022. Data was collected from 344 women with female primary subfertility, undergoing combined diagnostic hystero-laparascopy. All the demographic data along with identified causative factors (tubal blockade, cervical Os stenosis, endometrial polyp, uterine septum, uterine fibroid, endometriosis, peri tubal adhesions and polycystic ovaries) during the procedure were recorded in predesigned study proforma. Data were analyzed through SPSS software 23. Results: Mean age of the patients was 25±5.0 years and the mean duration of subfertility was 3.8+0.55 years. Two hundred and eighty-four (82.56%) patients had abnormal findings, while sixty (17.44%) had normal findings. Out of 284 patients, 94(34%) had one identified factor, while 190 (66%) patients had two or more identified factors for primary subfertility. Polycystic ovaries were seen in 128(37.21%) patients, followed by tubal blockade in 81(23.54%), peri tubal adhesions/hydrosalpinx in 58(16.86%) patients. Conclusions: Diagnostic hystero-laparoscopy is an effective diagnostic procedure for the evaluation of female factor subfertility and may be helpful to gynecologists in devising further management plans.

Corpus-Based Genre Analysis: Computer Science Research Article Introductions

Corpus-Based Genre Analysis: Computer Science Research Article Introductions Conventional definitions of genres, based on the notions of specific conventions such as of content (theme, setting etc.) and form (structure and style) have been disputed. Some scholars do not believe in the rigid rules of inclusion and exclusion of texts in a particular genre as they can be recognized intutively as instances of repetition and difference because of their 'family resemblences' among texts. Swales (1990) prefers the psycholinguistic concept of `prototypicality'. Genres usually go through phases or cycles of popularity as the crucial ideological concerns of the time in which they are popular are embodied in the generic conventions. The popular genre of research article, amongst the research and academic community, is undergoing a continuous evolution. Many scholars have attempted to explore this complex process of writing research article. The list is long, to name some: Berkenkotter and Huckin (1995); Montgomery (1996); Salager-Meyer (1998); Atkinson (1999); Valle (1999); Gross et al. (2002). The work of these scholars includes research articles from different disciplines. However, such scholarly work in the field of Computer science is limited. Cooper (1985), Posteguille (1999) and Anthony (1999) studies are either too broad or too narrow. As compared to these works, the present study addresses the issue at a greater length and is more in depth. The increasing use of computer led text corpora containing millions of words inspired the utilization of the corpus-based techniques for the present research. A corpus of 56 Research articles was created electronically. These articles were taken from five different journals of IEEE, the leading Computer society of the world. Wordsmith tools such as word frequency list, key word, collocation and concordance were applied to the corpus. Secondly, Swales (2004) CARS model was applied for the rhetorical analysis. Lexico-erammaticl analysis was done in terms of the rhetorical objectives of writing Introductions. The findings of the research as discussed in Chapter four focus on the syntactic and lexical patterns evident in the data. Theses include interesting N-grams (three and four word clusters); voice of the author (very different from the authors of other disciplines) and passivization of verbs. These stylistic excursions make an initial contribution to our understanding of Computer science research articles Introductions. The last three chapters of the dissertation constitute the heart of the discoursal analysis of the 56 Introductions in the corpus. These examine the structural-rhetorical features of the moves and steps involved, and the possible linkages between form and function. A revised CARS model has been suggested for writing Introductions of computer science research articles. Some recommendations have been put forward. The dissertation concludes with a note on the pedagogical relevance of the study.

Search or add a thesis