Search or add a thesis

Advanced Search (Beta)
Home > Development of Feature Selection Algorithms for High-Dimensional Binary Data

Development of Feature Selection Algorithms for High-Dimensional Binary Data

Thesis Info

Access Option

External Link


Javed, Kashif




University of Engineering and Technology







Thesis Completing Year


Thesis Completion Status



Applied Sciences





2021-02-17 19:49:13


2024-03-24 20:25:49



Asian Research Index Whatsapp Chanel
Asian Research Index Whatsapp Chanel

Join our Whatsapp Channel to get regular updates.


There has been a growing interest in representing real-life applications with data sets having binary-valued features. These data sets due to the advancements in computer and data management systems consist of tens or hundreds of thousands of features. In this dissertation, we investigate two problems in machine learning which have been relatively less studied for high-dimensional binary data. The first problem is to select a subset of features useful for supervised learning applications from the entire feature set and is known as the feature selection (FS) problem. The second problem is to compare two orderings of features induced by feature ranking (FR) algorithms and to determine which one is better. For the feature selection problem, we have proposed a new feature ranking measure termed as the diff-criterion. Its distinct attribute is that it estimates the usefulness of binary features by using their probability distributions. The diff-criterion has been evaluated against two well-known FS algorithms with four widely used clas- sifiers on six binary data sets on which it has achieved up to about 99% reduction in the feature set size. To further improve the performance, we have suggested a two-stage FS algorithm. The novelty of our two-stage algorithm is that the first stage provides the second stage with a reduced subset without losing valuable in- formation about the class. Two-stage feature selection used with the diff-criterion not only significantly improves the classification accuracy but also exhibits up to about 99% reduction in the feature set size. We have also compared our proposed FS algorithms against the winning entries of the “Agnostic Learning versus Prior Knowledge” challenge. The algorithms have shown results better or comparable to the winners of the challenge. For the problem of ranking features using FR algorithms, different FR algorithms estimate the importance of features with respect to the class variable differently thus generating different orderings. To determine which ordering is better, we propose a new evaluation method termed as feature ranking evaluation strategy (FRES). It uses the individual predictive power of features for estimating howAbstract correct is an ordering of features. We found that compared to Relief and mu- tual information algorithms our proposed diff-criterion generates the most correct orderings of binary features.

Similar Books


Similar Chapters


Similar News


Similar Articles


Similar Article Headings


فهرس الآيات القرآنية

الرقم المسلسل

فهرست الآیات القرآنیۃ


إِنَّ عَلَيْنَا جَمْعَهُ وَقُرْآنَهُ  فَإِذَا قَرَأْنَاهُ فَاتَّبِعْ قُرْآنَه


لَا يُصَدَّعُونَ عَنْهَا وَلَا يُنْزِفُونَ


لا مَقْطُوعَةٍ وَلا مَمْنُوعَةٍ


واشْتَعَلَ الرَّأْسُ شَيْبًا


وَلَكُمْ فِي الْقِصَاصِ حَيَاةٌ يَا أُولِي الْأَلْبَابِ


’’ لَا يَأْتِيهِ الْبَاطِلُ مِنْ بَيْنِ يَدَيْهِ وَلَا مِنْ خَلْفِهِ تَنْزِيلٌ مِنْ حَكِيمٍ حَمِيدٍ ‘‘


﴿ وَلَقَدْ صَرَّفْنَا لِلنَّاسِ فِي هَذَا الْقُرْآنِ مِنْ كُلِّ مَثَل


غُلِبَتْ الرُّومُ *فِي أَدْنَى الأَرْضِ وَهُمْ مِنْ بَعْدِ غَلَبِهِمْ سَيَغْلِبُونَ


سَيُهْزَمُ الْجَمْعُ وَيُوَلُّونَ الدُّبُرَ


قُلْ لَئِنْ اجْتَمَعَتْ الإِنسُ وَالْجِنُّ عَلَى أَنْ يَأْتُوا بِمِثْلِ هَذَا الْقُرْآنِ لاَ يَأْتُونَ بِمِثْلِهِ،


فَلْيَأْتُوا بِحَدِيثٍ مِثْلِهِ إِنْ كَانُوا صَادِقِينَ


قُلْ فَأْتُوا بِعَشْرِ سُوَرٍ مِثْلِهِ


فَأْتُوا بِسُورَةٍ مِنْ مِثْلِهِ


أَمْ يَقُولُونَ افْتَرَاهُ قُلْ فَأْتُوا بِسُورَةٍ مِثْلِهِ وَادْعُوا مَنْ اسْتَطَعْتُمْ مِنْ دُونِ اللَّهِ


وَلَقَدْ صَرَّفْنَا لِلنَّاسِ فِي هَذَا الْقُرْآنِ مِنْ كُلِّ مَثَلٍ


أَفَسِحْرٌ هَذَا أَمْ أَنْتُمْ لاَ تُبْصِرُونَ


ذَرْنِي وَمَنْ خَلَقْتُ وَحِيدًا


" غَيْرَ مُتَبَرِّجَاتٍ بِزِينَةٍ "


فِيهَا أَنْهَارٌ مِنْ مَاءٍ غَيْرِ آسِنٍ


لَوْلَا أَنْ تُفَنِّدُونِ


" يَسْأَلُونَكَ عَنِ الْأَنْفَالِ


لَقَدْ خَلَقْنَا الْإِنْسَانَ فِي كَبَد


Pengaruh Kinerja Kepegawaian Dalam Administrasi Perkantoran

Penelitian ini bertujuan untuk mengetahui dan menganalisis hubungan pengaruh kinerja kepegewaian dalam administrasi perkantoran di kantor camat kecamatann Tuhemberua kabupaten Nias Utara. Populasi dalam penelitian ini adalah pegawai Kantor Camat Kecamatan Tuhemberua yang berjumlah 12 orang. Jenis data yang digunakan dalam penelitian ini adalah data primer. Teknik pengumpulan data dengan angket, wawancara, observasi, ujian (test) dokumentasi. Teknik analisis data dengan melakukan verifikasi data, pengelolaan angket, dan pengolahan data: (a). Uji validitas data, (b). Uji reliabilitas data, dan (c). Pengujian hipotesis. Dari hasil perhitungan koefisien korelasi diperoleh rhitung (rxy) = 0,834 jika dikonsultasikan pada tabel harga kritik r produk moment untuk interval kepercayaan 5% setelah di hitung rhitung = 0,834 > rtabel = 0,576. Berdasarkan regresi linear sederhana maka sumbangan kinerja kepegawaian terhadap administrasi perkantoran di Kantor Camat Gunungsitoli Tuhemberua 2,668. Dari perhitungan koefisien determinasi besarnya pengaruh variabel x terhadap variabel y di Kantor Camat Gunungsitoli Tuhemberua 69, 48%. Berdasarkan kriteria pengujian hipotesis ternyata Ha adanya pengaruh dan H0 tidak adanya pengaruh, sebab thitung = 7, 20 > rtabel = 2,160 sehingga dapat dinyatakan adanya pengaruh kinerja kepegawaian terhadap administrasi perkantoran pada kantor camat Tuhemberuaa kabupaten Nias Utara.

New Control Methods for a Class of Nonlinear Systems With Constrained Input

Conventional nonlinear feedback control tools include linearization, gain scheduling, integral control, feedback linearization, sliding mode control, Lyapunov redesign, back stepping, passivity based control etc. Each of these techniques is designed to deal with a specific nature of problem. None of these methods are universal in the sense that it can be applied to all classes of nonlinear control problems. The realm of nonlinear control systems encounters theoretical and practical problems that do not fit into existing frameworks. This demands development of novel and innovative methods that go beyond conventional philosophy of control systems. This thesis also deals with such class of problems that is difficult to deal due to usual nonlinear control techniques. The core issue is hard constraints on the input of the system, that restrict the freedom of a control designer to incorporate control methods based on continuous stabilization, cancellation, compensation and/or adjustment of control parameters. The thesis starts with a discussion on sampled data tracking problem for a class of multi-input multi-output (MIMO) nonlinear systems. The nature of system is generic enough to handle many theoretical and practical problems. However, the thesis broadly focuses on a challenging example of the two-axis orientation control of a gyroscopic system with constrained input. During a single sample period, only a fixed amplitude pulse of variable position and width can be applied as a single control input. The example also falls in the category of under actuated systems due to single control of two axes. Alternately, pulse width and position can be construed as two inputs of the system. The output is also assumed to be available at only the sampling instants. All these restrictions result in a complex problem whose exact solution is not possible and thus we have to resort to approximate methods. The thesis begins with exploration of classical techniques. Firstly, a more conventional pulse width modulation approach based on principle of equivalent areas is proposed. This is followed by an error minimized control technique which is based on optimal control. The solution minimizes a cost function so as to obtain optimal values of pulse width and position. The problems of local minima and non-causality have to be addressed in order to solve the problem. The main contribution of the thesis is a particle controller for the class of systems under discussion. The classical theory of particle filters is adapted in order to solve the global optimization problem. A deterministic problem is solved using stochastic tools. The idea is to associate the cost function to be minimized with a probability density function (pdf). Input samples are drawn according to this pdf which are subsequently assigned weights using simulations of the system. The process includes steps like generation, refinement, regeneration, resampling etc. some of which are familiar in the realm of particle filters. This unconventional control philosophy has the potential to address a variety of control problems that are difficult to handle using available tools. Extensive Monte Carlo simulations have been performed for each of the above techniques. Where applicable, performance comparisons have also been made. The suggested techniques are computationally heavy and require fast processing. However, they suit parallel computing and can thus be embedded using FPGAs or ASICs.