Search or add a thesis

Advanced Search (Beta)
Home > Boosting Based Multiclass Ensembles and Their Applications in Machine Learning

Boosting Based Multiclass Ensembles and Their Applications in Machine Learning

Thesis Info

Access Option

External Link

Author

Mirza Mubasher Baig

Program

PhD

Institute

Lahore University of Management Sciences

City

Lahore

Province

Punjab

Country

Pakistan

Thesis Completing Year

2016

Thesis Completion Status

Completed

Subject

Computer Science

Language

English

Link

http://prr.hec.gov.pk/jspui/bitstream/123456789/9774/1/Thesis%20of%20Mirza%20Mubashir%20Baig%202004-03-0040.pdf

Added

2021-02-17 19:49:13

Modified

2024-03-24 20:25:49

ARI ID

1676727712327

Similar


Boosting is a generic statistical process for generating accurate classifier ensembles from only a moderately accurate learning algorithm. AdaBoost (Adaptive Boosting) is a machine learning algorithm that iteratively fits a number of classifiers on the training data and forms a linear combination of these classifiers to form a final ensemble. This dissertation presents our three major contributions to boosting based ensemble learning literature which includes two multi-class ensemble learning algorithms, a novel way to incorporate domain knowledge into a variety of boosting algorithms and an application of boosting in a connectionist framework to learn a feed-forward artificial neural network. To learn a multi-class classifier a new multi-class boosting algorithm, called M-Boost, has been proposed that introduces novel classifier selection and classifier combining rules. M-Boost uses a simple partitioning algorithm (i.e., decision stumps) as base classifier to handle a multi-class problem without breaking it into multiple binary problems. It uses a global optimality measures for selecting a weak classifier as compared to standard AdaBoost variants that use a localized greedy approach. It also uses a confidence based reweighing strategy for training examples as opposed to standard exponential multiplicative factor. Finally, M-Boost outputs a probability distribution over classes rather than a binary classification decision. The algorithm has been tested for eleven datasets from UCI repository and has consistently performed much better for 9 out of 11 datasets in terms of classification accuracy. Another multi-class ensemble learning algorithm, CBC: Cascaded Boosted Classifiers, is also presented that creates a multiclass ensemble by learning a cascade of boosted classifiers. It does not require explicit encoding of the given multiclass problem, rather it learns a multi-split decision tree and implicitly learns the encoding as well. In our recursive approach, an optimal partition of all classes is selected from the set of all possible partitions and training examples are relabeled. The reduced multiclass learning problem is then learned by using a multiclass learner. This procedure is recursively applied for each partition in order to learn a complete cascade. For experiments we have chosen M-Boost as the multi-class ensemble learning algorithm. The proposed algorithm was tested for network intrusion detection dataset (NIDD) adopted from the KDD Cup 99 (KDDâ˘A ´ Z99) prepared and managed by MIT Lincoln Labs as part of the 1998 DARPA Intrusion Detection Evaluation Program. To incorporate domain knowledge into boosting an entirely new strategy for incorporating prior into any boosting algorithm has also been devised. The idea behind incorporating prior into boosting in our approach is to modify the weight distribution over training examples using the prior during each iteration. This modification affects the selection of base classifier included in the ensemble and hence incorporate prior in boosting. Experimental results show that the proposed method improves the convergence rate, improves accuracy and compensate for lack of training data. A novel weight adaptation method in a connectionist framework that uses AdaBoost to minimize an exponential cost function instead of the mean square error minimization is also presented in this dissertation. This change was introduced to achieve better classification accuracy as the exponential loss function minimized by AdaBoost is more suitable for learning a classifier. Our main contribution in this regard is the introduction of a new representation of decision stumps that when used as base learner in AdaBoost becomes equivalent to a perceptron. This boosting based method for learning a perceptron is called BOOSTRON. The BOOSTRON algorithm has also been extended and generalized to learn a multi-layered perceptron. This generalization uses an iterative strategy along with the BOOSTRON algorithm to learn weights of hidden layer neurons and output neurons by reducing these problems into problems of learning a single layer perceptron.
Loading...
Loading...

Similar Books

Loading...

Similar Chapters

Loading...

Similar News

Loading...

Similar Articles

Loading...

Similar Article Headings

Loading...

باب ششم: قابل تجدید وسائل کا مطالعہ

اللہ تعالیٰ نے انسان کو بے تحاشا قابلِ تجدید ذرائع سے نوازا ہے، ان ذرائع میں پانی سے بجلی، شمسی توانائی، ہوا ئی توانائی ہیں۔ ان ذرائع میں پانی سے بجلی، شمسی توانائی، ہوا سے بجلی وغیرہ ہیں، قابلِ تجدید وسائل کا سب سے زیادہ افادیت یہ ہے کہ یہ کاربن ڈائی آکسائیڈ کی بہت کم مقدار خارج کرتے ہیں، 1 کلو واٹ فی گھنٹہ میں شمسی توانائی سے 87 گرام، جیو تھرمل سے 41گرام، ہوائی توانائی سے 31 گرام، جوہری توانائی سے 52 گرام کاربن بن ڈائی آکسائیڈ خارج ہوتی ہے۔ 1 کلو واٹ فی گھنٹہ میں پن بجلی سے صرف کم از کم ایک گرام سے 1500 گرام تک کاربن ڈائی آکسائیڈ خارج ہوتی ہے۔ جرمنی میں 30 فیصد، چین میں 29 فیصد اور جاپان میں 24 فیصد بجلی شمسی توانائی سے پیدا ہوتی ہے۔ پاکستان میں پن بجلی کی صلاحیت ایک لاکھ میگا واٹ ہے، ہوائی توانائی کی 50 ہزارمیگا واٹ ہے، بائیو ماس سے بھی بجلی پیدا کرنے کی صلاحیت بھی ہزاروں میگا واٹ میں ہے، پاکستان میں ایک مربع کلو میڑ پر ایک کلو واٹ توانائی پڑتی ہے جس سے ہزاروں میگا واٹ بجلی پیدا کی جا سکتی ہے۔ [1]

 مسلم سکالر'عبدالحمید' لکھتا ہے:

“It is the use of non-renewable resources, those minerals and fossil hydrocarbons whose natural cycles are on a geologic time-scale and are thus practically finite in human terms that are ecologically unsound. It is the rampant exploitation of such non-renewable resources over the past 20 years that has led to the industrial and technological way of life that dominates the planet.”[2]

 واپڈا حکام کے مطابق قابل تجدید وسائل سے آئندہ 20 برسوں میں ستانوے سو میگا واٹ پیدا کی جائے گی۔ نیپرا کی سالانہ رپورٹ...

اسلامی نقطہ نظر سے ماحولیاتی نظام کی پائیدار ترقی: ایک تحقیقی مطالعہ

In globalization and information age, Sustainable development is a contemporary issue to protect future generations. Islam is not only a religion, but also a guideline for whole life and is based on divine principles of Shari‘a that also address sustainable development to mankind. Indeed many values and principles that have been central to Islam are inclined towards prosperity of people and development of society. On other hand Industrial revolution brought a huge destruction on the earth because in capitalist system people are self concerned rather than society. Islamic social responsibility teaches lesson of unity and called a mankind an ummah (community) and a moderate ummah, who is not allowed to make any mischief on the earth. Everything on the earth is gifted by Allah to the mankind and man is the deputy of Allah and become a steward (khal┘fah) for the earth, now it is his responsibility to save the world from any harm. The main objective of this research is to present the principles and applications of Islam in sustainable development debate especially on ecological aspect.

Some New Families of Continuous Distributions Generated from Burr Xii Logit

This thesis is based on six chapters. In these chapters five new families of distributions are introduced by using the Burr XII distribution. In Chapter 1, a brief introduction of the existing families of distribution, the objectives and organization of this thesis are presented. In Chapter 2, Generalized Burr G family of distributions is proposed by using the function of cdf − log[1 − G(x)]. In Chapter 3, Marshall-Olkin Burr G family of distributions is introduced by using odd Burr G family of distributions used as generator proposed by Alizadeh et al. (2017). In chapter 4, odd Burr G Poisson family of distribution is introduced by compounding odd Burr G family with zero truncated Poisson distribution. In Chapter 5, a new generalized Burr distribution based on the quantile function following the method given by Aljarrah et al. (2014). In Chapter 6, Kumaraswamy odd Burr G family of distributions is introduced using odd Burr G family as a generator. The mathematical properties of these families are obtained, such as asymptotes and shapes, infinite mixture representation of the densities of the families, rth moment, sth incomplete moment, moment generating function, mean deviations, reliability and stochastic ordering, two entropies, Renyi and Shannon entropies. The explicit expression of distribution ith order statistic is also obtained in terms of linear combination of baseline densities and probability weighted moments. Model parameters are estimated by using the maximum likelihood (ML) method for complete and censored samples. Special models are given for each family, their plots of density and hazard rate functions are displayed. One special model for each family is investigated in detail. Simulation studies are also carried out to assess the validity of ML estimates of the model discussed in detail. Application on real life data is done to check the performance of the proposed families.