Search or add a thesis

Advanced Search (Beta)
Home > Discriminative Clustering Algorithms for Document Understanding, Tag Recommendation, and Web Surfer Behavior Prediction

Discriminative Clustering Algorithms for Document Understanding, Tag Recommendation, and Web Surfer Behavior Prediction

Thesis Info

Access Option

External Link

Author

Hassan, Malik Tahir

Program

PhD

Institute

Lahore University of Management Sciences

City

Lahore

Province

Punjab

Country

Pakistan

Thesis Completing Year

2013

Thesis Completion Status

Completed

Subject

Applied Sciences

Language

English

Link

http://prr.hec.gov.pk/jspui/handle/123456789/1282

Added

2021-02-17 19:49:13

Modified

2024-03-24 20:25:49

ARI ID

1676725909454

Similar


The Web is a goldmine of knowledge, but its realization requires effective and efficient discovery algorithms. Information on the Web ranges from textual documents to social content to usage patterns. Such information is huge and dynamic in nature making useful knowledge discovery a challenging task. In recent years, data mining techniques have been utilized for various knowledge discovery tasks with success. Data clustering, in particular, has two key advantages for Web mining: (1) it is an unsupervised technique that does not require labeled data; (2) it is a conceptually simple task that can produce readily understandable patterns. In this thesis, we develop and evaluate discriminative clustering algorithms for textual document understanding, social content tag recommendation, and Web surfing behavior analysis. Our discriminative clustering algorithms are efficient and semantically rich for effective knowledge discovery on the Web. For textual document clustering and understanding, we develop and evaluate a new algorithm called CDIM (Clustering via Discrimination Information Maximization). CDIM is an iterative partitional clustering algorithm that maximizes the sum of discrimination information provided by documents in the collection. A key advantage of CDIM is that its clusters are describable by their highly discriminating terms, or equivalently, their highly topically-related terms. This is achieved by incorporating statistically sound measures of discrimination that have been shown to convey semantic relatedness of terms to topics into the clustering algorithm. A hierarchical version of CDIM is also presented. CDIM’s superior performance is demonstrated on benchmark datasets in comparison with current state-of-the-art text clustering algorithms. For social content tag recommendation, we develop a model of contents and tags using CDIM for recommendation of tags of new content. User textual posts (contents) are clustered to yield a list of discriminative terms for each cluster. Likewise, textual tagging history is clustered to produce another list of terms. These lists are combined with user’s personal tagging history, if available, to produce the final tag recommendations. Our approach is evaluated on the data of a social book- marking system Bibsonomy. We observe that the recommendation accuracy can be improved by vupdating the recommendation model from time to time. To realize this in an efficient manner, we build a self-optimizing version of our tag recommendation system. The self-optimization strategy decides when and how to update the system by solving a nonlinear optimization problem con- strained on available time to decide the best clustering parameters (number of clusterable records and number of clusters). A better alternate to re-building the complete clustering models is doing corrections to clusters that are getting outdated and are contributing to errors. We achieve this by developing a self-calibration strategy for our system which is shown to be a better and more practical option. We also perform an analysis of personalized and non-personalized versions of our tag recommendation system. Besides our discriminative clustering based tag recommendations algorithm, performance of other algorithms including PITF (Pair wise Interaction Tensor Factor- ization), FolkRank, and adapted PageRank is analyzed on our proposed personalization groups (beginners, followers, and leaders) in folksonomies. For Web surfer behavior analysis, we find patterns of Web navigation paths among users and then develop discriminative and generative models for predicting future paths of users. Navigation patterns or behaviors are discovered by adapting the k-modes clustering algorithm with a new similarity measure appropriate for comparing navigation paths and a new method for cluster ini- tialization. Our experiments, conducted on two real-world datasets, demonstrate that predictions based on navigation behaviors are not necessarily better because of diversity of behaviors on the Web. Likewise, it is found that inclusion of start time of navigation sessions in predication models has little affect on accuracy but is significantly bad on efficiency. On the other hand, predictions based on cluster centroids are very cost-efficient without significant loss in accuracy. This thesis demonstrates the usefulness and versatility of clustering algorithms for Web mining, and highlights the importance of semantics in textual document analysis and self-management in practical Web systems. Directions for future work include semantic enhancements to CDIM and developments of self-management strategies for data mining applications.
Loading...
Loading...

Similar Books

Loading...

Similar Chapters

Loading...

Similar News

Loading...

Similar Articles

Loading...

Similar Article Headings

Loading...

مولانا سید شاہ رضوان اﷲ قادری مجیبی

مولانا سید شاہ رضوان اﷲ قادری مجیبی
افسوس ہے کہ خانقاہ مجیبیہ پھلواری شریف پٹنہ کے سجادہ نشین مولانا سید شاہ رضوان اﷲ قادری مجیبی ۳۱؍ دسمبر ۲۰۰۳؁ء کو وفات پاگئے، اِنا ﷲ وَاِنا اِلیہ رَاجِعونْ۔
ابھی عمر کی جس منزل میں وہ تھے، یہ جانے کے دن نہیں ہوتے لیکن مشیت الٰہی میں کس کا دخل؟ موت کا تو وقت مقرر ہے، فَاِذَا جَآءَ اجلُھُمْ لَا یستَأخِرُونْ سَاعۃ وَّلَا یستَقْدِمُوْنَ۔[الاعراف:۳۴]
خانقاہ مجیبیہ کا علمی و روحانی فیض مدت دراز سے جاری ہے، شاہ صاحب اس کی قدیم روایات اور اپنے عالی مقام اسلاف کی خصوصیات اور خوبیوں کے حامل تھے اور خود بھی ایک صاحب فیض عالم اور ذاکر و شاغل بزرگ تھے، ان کی ذات سے ہزاروں طالبین و سالکین فیض یاب ہورہے تھے مگر اب تزکیہ و اصلاح اور ارشاد کا یہ سرچشمہ بند ہوگیا۔
ان کو تقویٰ و اخلاص، صوم و صلوٰۃ کی پابندی، سادگی و درویشی اور خوش مزاجی و خوش خلقی ورثے میں ملی تھی، راقم کو دو تین بار ان کی خدمت میں حاضری اور ان کی پاکیزہ سیرت اور مطہر زندگی کے جلوے دیکھنے کا اتفاق ہوا، ہر بار نہایت خندہ پیشانی سے ملے اور لطف و کرم سے پیش آئے، اﷲ تعالیٰ ان کے درجات بلند کرے اور پس ماندگان کو صبر جمیل عطا کرے، آمین۔
(ضیاء الدین اصلاحی، فروری ۲۰۰۴ء)

پشتو زبان میں سیرت النبیﷺ پر اولین کتاب قلب السیر کا تحقیقی جائزہ

Pashto is the national language of Afghanistan and is one of the major languages spoken in KPK, Pakistan. According to a research, it has about seven thousand years old history. The speakers of this language are called Pathan or Afghans. They are, as a nation, Muslims. Its literature contains a vast part of Islamic Studies. The Oldest book on the Biography of the Holy Prophet in Pashto is considered to be “Qalbu Siyyr”. This article presents a research view of this book.

Performance Management System - a Comparative Study of Pakistani and Foreign Banking Sectors

Every organization, be it small or large and be it ‘for-profit’ or ‘not-for-profit’, wants to outperform its competitors through maintaining sustained success in the relevant industry. Human resource acts as a catalyst in achieving such a unique status in the industry. Once an organization becomes complacent with its performance and stops moving on a road to continuous improvement, it begins to decline. Main factors responsible for downfall of an organization are either lack of commitment or nonconformity to established standards procedures. Organizations operate through well organized and universally agreed upon systems, one amongst those systems is the Performance Management System (PMS). The research under review explores and studies the existing PMS of the two commercial banking sectors (Pakistani and Foreign) operating in Karachi and then draws a comparison of the two sectors to identify strengths and weaknesses of their respective PMS. In order to ensure coverage of the entire population of the two banking sectors, eight banks (four from each sector), of the banking industry were chosen. Another salient aim of the research has been to recommend a solid methodology to the banking sectors for achieving sustainable economic stimulation through PMS, HRD and employees Compensation. It is suggested to develop Human Resources at the national level in general and in financial institutions in particular, through effective Performance Management System practices. Banking industry is presently facing a hyper turbulent situation where banks have to operate in increasingly competitive and complex local and global markets. The ability to compete, in the fast paced global environment, is of paramount importance – survival of the fittest being the name of the game. The competition has even become xivquite tough and challenging, especially due to globalization, with the entry of many foreign world-class banks and other financial institutions. The central idea behind the research was to identify reasons for failure of our domestic banking sector to successfully compete in the prevailing challenging competitive environment. To cover various aspects of performance management, a sample of 400 managers (50 from each of the eight selected banks of the two sectors) was considered for the purpose of data collection. Since all the banks under study are located at Karachi, face to face interaction was also possible in most of the cases, in addition to other methods available for the purpose. A questionnaire, comprising 50 questions, was prepared to ensure that all areas of PMS, HRD and compensation system were thoroughly covered. Having collected the relevant data, through survey and interviews, a comprehensive analysis was carried out, by comparing the PMS practices being followed in the banks with those given in the world renowned PMS models and literature review. Extensive use of SPSS software was made to formulate statistical presentation of the analysis results. The consolidated analysis has indicated that there is a lack of clear understanding amongst many managers about PMS at macro level. They think that, perhaps, PMS is meant to just enhance employees’ salary and take other administrative actions. They don’t consider PMS as a complete system to enhance overall organization’s performance; neither can they link it with employee development and overall compensation system. It was found that, although most of the banks are trying to implement the PMS along with HR development and its related programs of HRD and Compensation, there was still a lot of room for improvement. In case of Pakistani xvbanks, impact of local culture is quite prominent, and managers make many short-cuts while evaluating the employees’ performance. As a result of analysis, some recommendations have been made for better planning and implementation of the PMS in the banking industry. It is strongly hoped that, through effective use of PMS, the financial sector can develop its workforce and play its vital role in bringing drastic improvement in the economy of Pakistan. Apart from proposing different measures to improve employee performance by effective implementation of the PMS, some suggestions have also been made for the potential research scholars to facilitate and to keep the research an ongoing activity.