Search or add a thesis

Advanced Search (Beta)
Home > Discriminative Clustering Algorithms for Document Understanding, Tag Recommendation, and Web Surfer Behavior Prediction

Discriminative Clustering Algorithms for Document Understanding, Tag Recommendation, and Web Surfer Behavior Prediction

Thesis Info

Access Option

External Link


Hassan, Malik Tahir




Lahore University of Management Sciences







Thesis Completing Year


Thesis Completion Status



Applied Sciences





2021-02-17 19:49:13


2024-03-24 20:25:49



Asian Research Index Whatsapp Chanel
Asian Research Index Whatsapp Chanel

Join our Whatsapp Channel to get regular updates.


The Web is a goldmine of knowledge, but its realization requires effective and efficient discovery algorithms. Information on the Web ranges from textual documents to social content to usage patterns. Such information is huge and dynamic in nature making useful knowledge discovery a challenging task. In recent years, data mining techniques have been utilized for various knowledge discovery tasks with success. Data clustering, in particular, has two key advantages for Web mining: (1) it is an unsupervised technique that does not require labeled data; (2) it is a conceptually simple task that can produce readily understandable patterns. In this thesis, we develop and evaluate discriminative clustering algorithms for textual document understanding, social content tag recommendation, and Web surfing behavior analysis. Our discriminative clustering algorithms are efficient and semantically rich for effective knowledge discovery on the Web. For textual document clustering and understanding, we develop and evaluate a new algorithm called CDIM (Clustering via Discrimination Information Maximization). CDIM is an iterative partitional clustering algorithm that maximizes the sum of discrimination information provided by documents in the collection. A key advantage of CDIM is that its clusters are describable by their highly discriminating terms, or equivalently, their highly topically-related terms. This is achieved by incorporating statistically sound measures of discrimination that have been shown to convey semantic relatedness of terms to topics into the clustering algorithm. A hierarchical version of CDIM is also presented. CDIM’s superior performance is demonstrated on benchmark datasets in comparison with current state-of-the-art text clustering algorithms. For social content tag recommendation, we develop a model of contents and tags using CDIM for recommendation of tags of new content. User textual posts (contents) are clustered to yield a list of discriminative terms for each cluster. Likewise, textual tagging history is clustered to produce another list of terms. These lists are combined with user’s personal tagging history, if available, to produce the final tag recommendations. Our approach is evaluated on the data of a social book- marking system Bibsonomy. We observe that the recommendation accuracy can be improved by vupdating the recommendation model from time to time. To realize this in an efficient manner, we build a self-optimizing version of our tag recommendation system. The self-optimization strategy decides when and how to update the system by solving a nonlinear optimization problem con- strained on available time to decide the best clustering parameters (number of clusterable records and number of clusters). A better alternate to re-building the complete clustering models is doing corrections to clusters that are getting outdated and are contributing to errors. We achieve this by developing a self-calibration strategy for our system which is shown to be a better and more practical option. We also perform an analysis of personalized and non-personalized versions of our tag recommendation system. Besides our discriminative clustering based tag recommendations algorithm, performance of other algorithms including PITF (Pair wise Interaction Tensor Factor- ization), FolkRank, and adapted PageRank is analyzed on our proposed personalization groups (beginners, followers, and leaders) in folksonomies. For Web surfer behavior analysis, we find patterns of Web navigation paths among users and then develop discriminative and generative models for predicting future paths of users. Navigation patterns or behaviors are discovered by adapting the k-modes clustering algorithm with a new similarity measure appropriate for comparing navigation paths and a new method for cluster ini- tialization. Our experiments, conducted on two real-world datasets, demonstrate that predictions based on navigation behaviors are not necessarily better because of diversity of behaviors on the Web. Likewise, it is found that inclusion of start time of navigation sessions in predication models has little affect on accuracy but is significantly bad on efficiency. On the other hand, predictions based on cluster centroids are very cost-efficient without significant loss in accuracy. This thesis demonstrates the usefulness and versatility of clustering algorithms for Web mining, and highlights the importance of semantics in textual document analysis and self-management in practical Web systems. Directions for future work include semantic enhancements to CDIM and developments of self-management strategies for data mining applications.

Similar Books


Similar Chapters


Similar News


Similar Articles


Similar Article Headings


المبحث الثالث: هل الشعر الحر نوعٌ من النثر؟

المبحث الثالث:هل الشعر الحر نوعٌ من النثر؟
وھناک بعض الأدباء والشعراء اعتبروا الشعر الحر نوعاً من النثر وقالوا بأن معانیہ تافھۃ لا معنی لھا، والبعض قالوا بأن جذورھا وُجدَت في الموشحات الأندلسیۃ، وأن البند کان معروفاً ولکن أسلوبہ کان مجھولاً، ولم ینظمہ إلا شعراء العراق، ولکن نازک الملائکۃ استنکرت ھذا الشيء، وقالت أنھا لم تسمع بالبند قبل سنۃ 1953م۔ وقالت نازک الملائکۃ في موضع آخر عن الشعر الحر: "ولعل أبرز الأدلۃ علی أن الحرکۃ کانت ولیدۃ عصرنا ھذا، وأن أغلبیۃ قرائنا ما زالوا یستنکرونھا ویرفضونھا، وبینھم کثرۃ لا یستھان بھا تظن أن الشعر الحر لا یملک من الشعر إلا الإسم فھو نثر عادي لا وزن له" .
هل کانت حركة الشعر الحر قوية أ م لا ؟
ھذہ الحرکۃ الجدیدۃ(حرکۃ الشعر الحر) کانت قویۃ راسخۃ ثابتۃ متحمسۃ ولکن في بدایۃ الأمر کأي حرکۃ جدیدۃ زلّت وتخبطت ولکن بعد فترۃ من الزمن استکملت أسباب نضجھا فأصبحت حرکۃ مشھورۃ مستسلمۃ. وأخذت ھذہ الدعوۃ الشعریۃ الجدیدۃ تنتشر حتی کونت لنفسھا مکانۃ قویۃ، وبدأ بعض الشعراء الأفاضل یھجرون أسلوب الشطرین ویستعملون أسلوب الشطر.
أما نازک الملائکۃ فقد کانت ذکیۃ جداً، فبذکاءھا استطاعت أن تأخذ الریادۃ والمیزۃ المنفردۃ من بین الشعراء الأفاضل۔ فقد کانت واعیۃ وذات طموحات عالیۃ، قدمت نازک الملائکۃ الأدلۃ والبراھین حتی جعلت الشعراء والأدباء والنقاد والقراء أن یستسلموا لھذا الأمر۔

الرؤية الإصلاحية للإمام النورسي: أثرها وامتدادها في العالم

الإصلاح عملية تقتضي مقاربة شمولية من حيث مدلولها، يلتقي فيها الجانب النظري بالتطبيقي، مع وضوح في الرؤية والمنهج، ذلك أنه يأخذ بعين الإعتبار الوضع القائم والإنطلاق منه بتثيت الصالح فيه، وتقويم ما اندرس للإنتقال به إلى وضع جديد أفضل. ومن هنا تأتي هذه الورقة العلمية لتسليط الضوء على فكر وعطاء العلامة بديع الزمان النورسي، قصد الإلمام بحيثيات مشروعه الإصلاحي، ودراسة أثره الممتد إن على المستوى الفكري أو السياسي أو الحضاري، فقد عَبَرَ النورسي بأفكاره ومُعطياته حدود الزمان والمكان، وشكّل منعطفاً حضارياً هاماً في معالجاته الشمولية لمفاهيم هذا الدين العظيم، وفي تقديمه لمشاريع إصلاحية رائدة يحتاج إليها المصلحون، وعليه لا تزال الحاجة إلى دارسات معمقة ومتتابعة ومن جوانب متعددة، لمشروعه الإصلاحي ، للإرشاد في محاولة الرقي والإستئناف الحضاري. الكلمات المفتاحية: الإمام النورسي، التجديد، الإصلاح، المنهج الإصلاحي.

Corporate Entrepreneurship, Agency Cost and Firm Performance: Evidence from Developed and Developing Economies

This study aims to extend the relationship of corporate entrepreneurship and agency cost, to firm performance. It also examines this relationship in the presence of behavioral biases to address the behavioral finance approach, and validates it in developed (USA) and developing (Pakistan) economies, in order to generalize the study. The design of this dissertation is to investigate the relationship of corporate entrepreneurship, agency cost and firm performance across both behavioral and traditional approaches of finance. The validated construct has been adopted to measure the corporate entrepreneurship, behavioral biases and risk perception of USA and Pakistani non-financial sector companies listed on the New York Stock Exchange (NYSE) and the Karachi Stock Exchange (KSE), respectively. The data for firm performance and agency cost has been taken from Balance Sheets Analyses (SBP Report) for Pakistani companies and from annual reports of the USA companies on a three yearly average bases (2009, 2010 and 2011). The findings highlight the significant negative relationship between corporate entrepreneurship and agency cost in USA, showing that corporate entrepreneurship can act as an excellent technique in reducing agency problems within organizations, ultimately leading to high performance, however, there is an insignificant impact between corporate entrepreneurship and agency cost in Pakistani context. Regarding the behavioral finance approach, both economies didn’t show any significant relationship of behavioral biases on risk perception; however, a significant relationship of risk perception on corporate entrepreneurship, depicts behavioral biases didn’t impact corporate finance decisions. It shows that corporate finance decisions may differ from person to person irrespective from culture to culture and from country to country, pointing towards individualistic approach. This study provides a foundation for future studies on the relationship of corporate entrepreneurship, agency cost and firm performance. My study helps executives to assess their own scenario while making effective entrepreneurial and financial decisions within companies and how to control or reduce the impact of behavioral biases in particular situations, in order to maximize their return Keywords: Corporate Entrepreneurship, Agency Cost, Firm Performance, Behavioral Biases, and Corporate Financial Decisions.