Search or add a thesis

Advanced Search (Beta)
Home > Biomedical Data Retrieval Using Enhanced Query Expansion

Biomedical Data Retrieval Using Enhanced Query Expansion

Thesis Info

Access Option

External Link


Muhammad Qadeer


Virtual University of Pakistan

Institute Type








Thesis Completing Year


Thesis Completion Status



Software Engineering





2021-02-17 19:49:13


2024-03-24 20:25:49



Asian Research Index Whatsapp Chanel
Asian Research Index Whatsapp Chanel

Join our Whatsapp Channel to get regular updates.


Biomedical data is growing up rapidly and a better retrieval system is the need for its utilization. A basic problem while retrieving data from a system related to the queries is mismatch of words, which indicates the use of dissimilar words for expressing the identical concepts in given queries and in the stored documents. Two techniques are commonly used to solve this problem i.e. query paraphrasing as well as query expansion. Query paraphrasing refers that the query is paraphrased by using synonyms of terms in the query. Query expansion techniques are further categorized as local and global. Local query expansion technique focuses on the analysis of the documents having top ranks retrieved for a query. Different ranking models have been introduced to rank documents in collections based on terms and features. A collection of candidate terms is obtained for expanding the given query from these documents. On feature selection from term pool, final selected candidate expansion terms contain a few terms which cause query drift problem. To overcome this problem the semantic filtering technique was used. Semantic similarity measures are the basic techniques for successful semantic filtering. However, global query expansion relies on the analysis of the whole collection to find out word relationships. Synonyms of query words are extracted from a dictionary or thesaurus. In this research, we evaluated the famous probability-based ranking models such as LM-Dirichlet, LM Jelinek Mercer and BM25 for biomedical data retrieval process. We performed experimental analysis using diverse preprocessing techniques iteratively on 36 biomedical related queries for the evaluation. State of the art biomedical data set Trec Genomic was used as a core for whole experimentation. It was observed that BM25 was the best information retrieval model for biomedical data. We used different terms scoring techniques such as Baseline, BNS, Chi-Square, Codice, BIM, KLD, LRF, PRF and RSV to score the terms related to the query. The average of MAP scores of all the queries was compared that exhibited BNS term scoring technique is the best for biomedical data. Different semantic similarity measures such as Path-based, Wu and palmer, Leacock and Chodorow were applied on terms extracted from BNS to get most appropriate terms for query expansion. Finally, queries expanded with the most similar terms each time and documents retrieved through the expanded queries and the MAP results were evaluated for the purpose of final declarations of this research. The results of biomedical data retrieval through query expansion were improved and the LCH semantic similarity measuring technique found best for query expansion in biomedical data retrieval system.

Similar Books


Similar Chapters


Similar News


Similar Articles


Similar Article Headings


حکیم محمد زماں الحسینی

حکیم محمد زماں حسینی کاانتقال
یہ کس کومعلوم تھا کہ بیسویں صدی جاتے جاتے بھی امت مسلمہ کوایسا صدمہ دے جائے گی کہ جس سے امت مسلمہ عرصہ دراز تک ابھر نہ سکے گی۔عالم دین،مفسرقرآن،مصنفِ اسلام، مدبر ومفکر حضرت مولانا حکیم محمد زماں حسینی رمضان المبارک کے مقدس مہینے میں اس عالم فانی سے رخصت ہوکرعالم بقاء میں پہنچ کر مالک حقیقی سے جاملے۔اناﷲ واناالیہ راجعون۔
ان کے انتقال پرملال پرتعز یت پورے عالم اسلام میں کی جائے گی۔اس لیے کہ ان کی شخصیت کے اٹھ جانے سے تمام عالم اسلام کوصدمہ پہنچا ہے،نقصان ہواہے۔ان کی زندگی عالم اسلام کی خدمت کے لیے جیسے وقف تھی۔انہوں نے اپنی تحریروں،تقریروں اور تصانیف کے ذریعہ عالم اسلام کی سچی رہنمائی وخدمت کی ہے۔وہ بے لوث اورمخلص تھے کسی جاہ ومنصب سے بے نیاز صرف دین کی خدمت میں ہی ان کوسکون واطمینان اورراحت وخوشی حاصل تھی۔شیخ الاسلام حضرت مولانا سید حسین احمد مدنی ؒ کے خصوصی تلامذہ میں سے تھے۔صحیح فکر تھی، سوچ میں سچائی تھی،بلند کردار کے حامل تھے،سادگی رگ وریشہ میں سرایت کی ہوئی تھی۔رئیس الاحرار مولانا محمد علی جوہر کی طرح جوش وجذبہ سے طبیعت بھری ہوئی تھی۔حضرت مولانا عبدالماجد دریابادی کی طرح وسیع النظر تھے اورحضرت مولانا سید ابوالحسن علی ندوی کی علمی صحبت ومجلس سے فیض یافتہ تھے۔مفکر ملت حضرت مفتی عتیق الرحمن عثمانی کے جاں نثار شیدائی شاگردوں میں بھی ان کاشمار ہوتاتھا۔حضرت مفتی عتیق الرحمن عثمانی ان کے علم وفکر کے معترف وشناسا تھے۔ سیرت پاک پرحضرت مولانا حکیم محمد زماں حسینی صاحبؒ کی تقاریر سننے سے تعلق رکھتی تھیں۔ہندوستان کے وزیر اعظم راجیو گاندھی سیرت پاک کے جلسے میں ان کی تقریر سننے کے لیے شروع سے آخر تک بیٹھے رہے اوررسول پاک ﷺ کی روزمرہ زندگی کے تمام واقعات،پڑوسیوں سے حسن سلوک،غیر مسلموں سے بہترین برتاؤ،دشمنانِ اسلام سے نبی اکرم...

مکانة السنة في نظر أهل القرآن

This article is about misunderstandings of "Ah-lul-Quran" towards "Al-Sunnah" they call themselves Ah-lul-Quran, though they do not deserve this title. They deny authenticity of the Hadith as well as the work of Mohaddisien, following in the footprints of their spiritual mentors who are primarily orientalists such as subringer, William mowver and Goldzehar. Actually in sub-Continent treacherous act of denial of Ahadith, was outcome of conspiracies hatched by imperial world. Major misunderstandings of Ah-lul-Quran towards Sunnah arise from the following. *          Status of the Prophet (SAW) in their eyes. *          According to their view Sunnah was not compiled during the time of prophet hood. *          Doubts about the ahadith as fabrication Indeed their views are based on nothing but merely misconceptions and ill-will against Islam. They not only deny the Sunnah but also the Quran. This paper refutes the objections held forth by the Ah-lul-Quran by indepth analysis and valid references.

Single Nucleotide Polymorphism-Based Association Studies of Bladder Cancer Patients

There are a growing number of studies conducted in different parts of the world to understand the genetic etiology of urinary bladder cancer (UBC), which is a life- threatening disorder. Therefore to find the susceptible genetic loci we conducted a case-control genetic association study on Pakistani urothelial carcinoma patients (N = 200) and healthy controls (N = 200). For this purpose, four types of sequence variations were studied viz. VNTR polymorphism of eNOS, Alu repeat variation of ACE gene, null polymorphisms of GSTT and GSTM genes and selected common variants of GSTP1, MTHFR, PSCA, TNFα, p21, TP53, CYP1B1, XPD, XRCC1, CAV1, PON1, IGFBP3, VEGFA, LEP, LEPR, PPARγ genes as well as intergenic 8q24 region. In addition to an overall risk assessment, these polymorphisms were also analyzed with respect to the smoking status as well as with respect to tumor grade and stage. Haplotype-based association analysis of variants residing in linkage disequilibrium were also carried out and a gene-gene interaction was studied with reference to combined genotype analysis of functionally related genes. The risk variants of GSTM, LEPR, ACE, PSCA and 8q24.21 locus (rs9642880 and rs6983267) were found to be associated with significantly higher risk while IGFBP3 variant and haplotypes of CAV1 and MTHFR were found to be associated with reduced risk of UBC in the overall comparison of cases and controls. In the gene-smoking interaction CYP1B1, p21 (Ser allele), ACE and rs9642880 conferred a high UBC risk in smokers while LEPR and PSCA variants were found to be associated with increased risk of bladder oncogenesis in non-smokers only. In addition, p21 (Arg allele) was found to be associated with reduced UBC susceptibility in smokers while IGFBP3 and CAV1 haplotypes protected against urothelial carcinoma of the bladder in non-smokers only. GSTM0 and the risk allele of rs6983267 did not show a gene-smoking interaction because of their significant risk contribution in both smoker and non-smoker groups. With reference to tumor grade and stage, a trend of similar genetic etiologies was observed in low grade and non-invasive tumor, while the high grade and invasive tumor types were also found to have common genetic etiologies which were different from the former group. GSTM0, LEPR and rs9642880 were found to be associated with enhanced risk of low grade as well as non-invasive bladder carcinoma. GSTT0, CAV1, PSCA and PPARγ were found to predispose individuals to an elevated risk of ixhigh grade and invasive tumor. ACE and rs6983267 were non-specifically associated with both low and high grades as well as with non-invasive and invasive tumors. IGFBP3 SNP protected against low and high grade as well as against non-invasive stage. The haplotypes of MTHFR were found to confer a high risk of non-invasive tumor while providing protection against MIBC. In brief, the present study revealed the association of some of the genetic variants to the overall disease susceptibility in addition to some gene-smoking and gene-gene interactions.