Search or add a thesis

Advanced Search (Beta)
Home > Architecture Recovery of Legacy Software Systems Using Unsupervised Machine Learning Techniques

Architecture Recovery of Legacy Software Systems Using Unsupervised Machine Learning Techniques

Thesis Info

Access Option

External Link

Author

Maqbool, Onaiza

Program

PhD

Institute

Lahore University of Management Sciences

City

Lahore

Province

Punjab

Country

Pakistan

Thesis Completing Year

2006

Thesis Completion Status

Completed

Subject

Computer Science

Language

English

Link

http://prr.hec.gov.pk/jspui/handle/123456789/462

Added

2021-02-17 19:49:13

Modified

2024-03-24 20:25:49

ARI ID

1676727705163

Similar


Perhaps the most important aspect in maintaining software legacy systems is un- derstanding their architecture. Architectural documentation is often unavailable. Thus efforts need to be made to recover the architectural design from the source code. This thesis addresses the problem of recovering the architecture of software systems for greater understanding, and modularizing them for greater maintainability, using machine learning techniques. We use clustering to obtain a high-level view of a software’s architecture, by identify- ing major sub-systems within it. For this purpose, we analyze the behaviour of existing similarity and distance measures when applied to software artifacts and keeping in view software characteristics, yielding explanations to some previously unanswered questions. We develop two new hierarchical clustering algorithms that address the problem of ar- bitrary decisions taken by existing hierarchical algorithms. We also propose a similarity measure suitable for software clustering. The performance of the proposed algorithms and similarity measure is evaluated using internal and external assessment. Instead of using only one expert decomposition for external assessment, as is commonly done, we use decompositions prepared by 4-5 experts for each test system. Such an approach allows us to validate the idea of multiple views of a software system. Experiments carried out on five open source legacy software systems show that the performance of our proposed algorithm is better than previously used algorithms. Interpreting the results of clustering algorithms is often difficult. To make clusters easier to understand, we propose a labeling scheme for clusters and compare two alter- native ranking schemes that can be utilized for this purpose. We demonstrate how the labels assigned by our scheme aid understanding of the clustering process of clustering algorithms. We also provide a comparison between cluster analysis and concept analysis as modularization techniques, and give examples of their application to different software structures, thus indicating the strengths and limitations of the two techniques. Finally, we use association rule mining to gain insight into the low-level structure of software systems by examining relationships between architectural quarks i.e. functions, global variables and user defined types. Metarule-guided association rule mining is used to ividentify problems within structured legacy systems. Re-engineering patterns that present solutions to these problems are proposed. Results for the test systems reveal interesting characteristics which allow us to understand legacy systems and their evolution.
Loading...
Loading...

Similar Books

Loading...

Similar Chapters

Loading...

Similar News

Loading...

Similar Articles

Loading...

Similar Article Headings

Loading...

کہنے کو ایک پل بھی تو مجھ سے جدا نہیں

کہنے کو ایک پل بھی تُو مجھ سے جدا نہیں
پر یہ بھی سچ ہے مجھ پہ تُو پورا کھلا نہیں

أهمية المنهج التطبيقي في تدريس الحديث النبوي وعلومه

Hadîth is the second important source of Islamic Law after the Qur’ᾱn. There is a consensus among the Muslims that Sunnah is the second revealed fundamental source of Islamic sciences. Due to the importance of these fundamental sources, Muslim scholars and educational institutions around the world have played an important role in the development of Hadîth sciences. There are different teaching methodologies and learning approaches. We should use different teaching methods to improve the quality of Hadith studies to the best level and achieve our objectives. The Applied approach is an approach that emphasizes the relevance of what is being learnt to the real world outside the classroom and makes that relevance as immediate and transparent as possible. It is a valuable approach that can be used at all levels of education. It motivates students, improves their confidence and also provides a meaningful context for learning both theoretical concepts and practical skills. There are immense possibilities for development in Hadîth studies by using the applied approach in teaching and learning of Hadîth and its sciences. The challenge is to ensure that applied approach in teaching of Hadith and its sciences plays a constructive role in improving the educational quality of Hadith studies to the level best. This research article is based on importance of applied approach in teaching of Hadîth and its Sciences.

Inheritance and Genetic Variability of Wheat Germplasm Triticum Aestivum L. of Nwfp, Pakistan Determind by Morphological Traits and Biochemical Markers.

Wheat (Triticum aestivum L.) germplasm of one hundred accessions were evaluated for days to emergence, days to heading, days to maturity, number of tillers plant-1, Plant height (cm), spike length (cm), number of spikelets spike-1, grain yield plant-1(gm), 1000-grain weight (gm) and Yield (Kg ha-1). Mean, range, standard deviation and coefficient of variation were computed for each quantitative trait to estimate the extent of genetic diversity present in the local wheat germplasm. Corelation coefficient, cluster and Principal component analysis were carried out. The whole set of germplasm was subjected to SDS-PAGE analysis to investigate genetic variation for high HMW glutenin subunits. To study the genetic variability of one hundred wheat (Triticum aestivum L.) entries, an experiment was conducted during the growing season 2004-2005 in augmented field design at research area of the Department of Plant Breeding and Genetics, Faculty of Agriculture, Gomal University, Dera Ismail Khan, NWFP, Pakistan. All the germplasm was evaluated and characterized for the traits days to emergence, days to heading, days to maturity, number of tillers plant-1, Plant height (cm), spike length (cm), number of spikelets spike-1, grain yield plant-1, 1000-grain weight and grain yield (kg ha-1). Genetic diversity was found satisfactory for all the traits. Variation was statistically found for all the parameters. Days to emergence varied from 7.10 to 20.10 days having 24.03% coefficient of variation. Days to heading ranged from 79.15 [PARC/NIAR 00203 (05)] to 130.25 [PARC/MAFF 004271 (01)] days with coefficient of variation 9.35 %. Minimum days to maturity (136) were taken by the entry PARC/NIAR 00203 (05) while the maximum days to maturity (193) were taken by PARC/MAFF 004271 (01). The entry PARC/NIAR 002809 (01) produced maximum number of tillers plant-1. PARC/NIAR 00203 (05) has a maximum plant height (125.6 cm), while the entry PARC/MAFF 004270 (03) had the shortest plant height of (53.2 cm). Days to emergence have positively significant correlation with number of tillers plant-1 while, negatively significant correlation with days to maturity. Days to heading have a significant and positive correlation with days to maturity. Days to maturity have negative correlation with plant height and number of tillers plant-1. Plant height has a significant correlation with number of tillers plant-1. The frequency distribution shows that spike length ranged from 6.2 to 22.1 (cm). Variability was observed among the accessions for spike length (cm), which varied from 6.50 to 21.90 (cm) with mean value of 12.23 ± 2.28 (cm) and coefficient of variation for this parameter was 18.63 %. The results of formal analysis revealed that distant variability in spikelets spike-1 was detected which ranged from 8.50 to 29.80 numbers of spikelets spike-1 with the mean value of 16.35 ± 3.00 and coefficient of variation is 8.32%. The frequency distribution for number of spikelets spike-1 showed the variability from 7.1 to 31.00. The variation for grain yield plant-1 ranged from 1.26 to 4.58 (g) with mean value of 2.36 ± 0.52 and coefficient of variation 21.89 %. Frequency distribution for grain yield plan-1 ranged from 1.26 to 3.32 (g). 1000- grain weight (g) varied from 15.74 to 46.65 (g) with the mean value of 34.20 ± 8.05 and coefficient of variation for this parameter was 23.55%. The frequency distribution for 1000-grain weight (g) ranged from 15.20 to 47.19 (g). The variation for grain yield (kg ha-1) ranged from 2610 to 5058 (kg ha-1) with mean value of 4165 ± 504.45 (kg ha- 1 ) and coefficient of variation for this character was 12.11%. Frequency distribution for grain yield (kg ha-1) ranged from 2610 to 5065.9 (kg ha-1). Spike length revealed significant and highly positively correlation with number of spikelets spike-1 (r = 0.20), grain yields plant-1 (r = 0.16) and grain yield (kg ha-1) (r = 0.18), while this trait has negative correlation with 1000-grain weight (r = -0.02). Highly significant and positive correlation was reviewed in number of spikelets spike-1 with grain yield plant- 1 (r = 0.49) and grain yield (kg ha-1) (r = 0.34). While positive correlation of this trait was noted with 1000-grain weight (r = 0.02). Grain yield plant-1 had highly significant positive correlation with 1000-grain weight (r = 0.30) and grain yield (kg ha-1) (r = 0.62). Highly significant positive correlation was observed of 1000-grain weight with grain yield (kg ha-1) (r = 0.44). The clustering of accessions on the basis of morphological similarities grouped the accessions into fifteen and thirteen clusters for the year 2005 and 2006 respectively. Whereas a scatter diagram on the basis of altitude and latitude shows that accessions collected from 1200 ~ 2000masl and 30oَ 39 to 34oَ 40N latitude have more morphological similarities than with the other group. The principal component having greater than 1 eigenvalue contributed more than 61.62% genetic variation among wheat accessions. The contribution of genetic diversity by first three PCs was above 51.65% as compared to PCs of the total accessions during both the years. The PC1 accounted for 25.62% variation and was positively associated with majority of the traits. The character, which contributed more positively to PC1, was days to maturity. A considerable variation in total 12 different HMW glutenin subunit compositions was found. The frequency of 7+8 and 2+12 was the highest in the entire set of germplasm. During the present investigation fifteen accessions (PARC/MAFF 4272 (01), PARC/MAFF 4269 (01) PARC/MAFF 4358 (01), PARC/MAFF 4355 (02), PARC/JICA 3835 (05), PARC/MAFF 4358 (03), PARC/MAFF 4292 (01), PARC/MAFF 4354 (02) PARC/MAFF 4354 (01), PARC/MAFF 4264 (03), PARC/MAFF 4280 (03) PARC/MAFF 4269 (02), PARC/MAFF 4279 (01), PARC/MAFF 4277 (01), PARC/MAFF 4277 (02)) possessing 5+10 allele, which is a known source for good bread making quality, have been identified. According to experiment the research material containing four wheat (Triticum aestivum L.) varieties i.e. Bhakker-2002, Takbeer, BWP-2000 and Uqab-2000 with twelve hybrids each of F1 and F2 generations were analyzed in a randamize complete block design to observe the genetic analysis, genetic advance, combining ability, heritability and heterotic effects for different quantitative and qualitative parameters. The cultivars were crossed in a complete diallel fashion according to Hayman’s diallel analysis and Griffing’s approach for identification of useful recombination’s in segregating generations which could be used in any hybridization program. The adequacy of additive-dominance was proved by Hayman Jinks modle, which proved that the modle was adequate for all the parameters in F1 and F2 generations.The parents along with hybrids indicates significant differences for maximum parameters presenting the genetic segregation. In F1 generation, the additive and dominant variances were found significant for all the parameters, except 1000-grain weight. While the parameters i.e. days to to heading, spikelets spike-1 and 1000-grain weight indicates non significant values for dominant components. In F2 generation among all the parameters only four parameters i.e. days to heading, days to maturity, plant height and 1000-grain weight were found to be non significant for additive variance. The dominant variance was non significant for all the parameters, except spike length in F2 generation. Hence it was proved that majority of the parameters both in F1 and F2 generations were governed by additive type of gene action. While degree of dominance also proved by the results of F2 generation values which were found lower than F1 generation values. Broad sense and narrow sense heritability along with genetic advance was also noted both in F1 and F2 generations for qualitative and quantitative parameters indicating the importance of genetic variance. On the basis of experimental research work it is cleared that the hybrids showing significance for all the parameters indicating the genetic divergence of the parents used in the wheat breeding program. Acocording to general combining ability, specific combining ability and reciprocal effects in F1 and F2 generations revealed significant for all the parameters, except number of spikelets spike-1 which was non significant for general combining ability and of number of tillers plant-1, spike length, spikelets spike-1, number of grains spike-1 and grain yield plant-1 were non significant for specific combing abilty in F1 generation. The combining ability analysis indicates that maximum parameters were governed by partial dominance with additive gene action in F1 and F2 generations. However, maximum hybrids proved as best general combiner. Therefore it was cleared that comparatively low and high valus of the parents performed best in specific combinig ability determination. Acoording to hetrotic analysis it was cleared that heterosis over mid parenta was much pronounced than heterosis over batter parents in F1 generation and along with inbreeding depression in F2 generation. On the basis of research findings due to additive gene effect, dominance gene effect, specific gene effect, maternal effect, genetic advance, broad sense heritability, narrow sense heritability, general combining ability, specific combining ability and hetrotic analysis with inbreeding depression, it was cleared that among all the hybrids only the hybrids Takbeer x Uqab-2000, Bhakker-2002 x BWP-2000 and Bhakker- 2002 x Uqab-2000 were found best potential with all the desirable parameters for further wheat breeding program under different agro-climaticconditions in the area of Dera Ismail Khan, NWFP, Pakistan for general cultivation." xml:lang="en_US