Skip to main content
Kent Academic Repository

New Probabilistic Graphical Models and Meta-Learning Approaches for Hierarchical Classification, with Applications in Bioinformatics and Ageing

Fabris, Fabio (2017) New Probabilistic Graphical Models and Meta-Learning Approaches for Hierarchical Classification, with Applications in Bioinformatics and Ageing. Doctor of Philosophy (PhD) thesis, University of Kent,. (doi:10.22024/UniKent/01.02.63883) (KAR id:63883)

PDF
Language: English


Download this file
(PDF/1MB)
[thumbnail of 247Thesis_Fabio_Fabris_V2.pdf]
Preview
PDF
Language: English

Restricted to Repository staff only

Contact us about this Publication
[thumbnail of 247Thesis_Fabio_Fabris_V2.pdf]
Official URL:
https://doi.org/10.22024/UniKent/01.02.63883

Abstract

This interdisciplinary work proposes new hierarchical classification algorithms and evaluates them on biological datasets, and specifically on ageing-related datasets. Hierarchical classification is a type of classification task where the classes to be predicted are organized into a hierarchical structure. The focus on ageing is justified by the increasing impact that ageing-related diseases have on the human population and by the increasing amount of freely available ageing-related data.

The main contributions of this thesis are as follows. First, we improve the running time of a previously proposed hierarchical classification algorithm based on an extension of the well-known Naive Bayes classification algorithm. We show that our modification greatly improves the runtime of the hierarchical classification algorithm, maintaining its predictive performance.

We also propose four new hierarchical classification algorithms. The focus on hierarchical classification algorithms and their evaluation on biological data is justified as the class labels of biological data are commonly organized into class hierarchies. Two of our four new hierarchical classification algorithms - the "Hierarchical Dependence Network" (HDN) and the "Hierarchical Dependence Network algorithm based on finding non-Hierarchically related Predictive Classes'' (HDN-nHPC) - are based on Dependence Networks, a relatively new type of probabilistic graphical model that has not yet received a lot of attention from the classification community. The other two hierarchical classification algorithms we proposed are hybrid algorithms that use the hierarchical classification models produced by the Predictive Clustering Tree (PCT) algorithm. One of the hybrids combines the models produced by the PCT algorithm and a Local Hierarchical Classification (LHC) algorithm (which basically induces a local model for each class in the hierarchy). The other hybrid combines the models produced by the PCT and HDN algorithms.

We have tested our four proposed algorithms and four other commonly used hierarchical classification algorithms on 42 hierarchical classification datasets. 20 of these datasets were created by us and are freely available for researchers. We have concluded that, for one out of the three hierarchical predictive accuracy measures used in our experiments, one of our four new algorithms (the HDN-nHPC algorithm) outperforms all other seven algorithms in terms of average rank across the 42 hierarchical classification datasets.

We have also proposed the first meta-learning approach for hierarchical classification problems. In meta-learning, each meta-instance represents a dataset, meta-features represent dataset properties, and meta-classes represent the best classification algorithm for the corresponding dataset (meta-instance). Hence, meta-learning techniques for classification use the predictive performance of some candidate classification algorithms in previously tested datasets, and dataset descriptors (the meta-features), to infer the performance of those candidate classification algorithms in new datasets, given the meta-features of those new datasets.

The predictions of our meta-learning system can be used as a guide to choose which hierarchical classification algorithm (out of a set of candidate ones) to use on a new dataset, without the need for time-consuming trial and error experiments with those candidate algorithms. This is particularly important for hierarchical classification problems, as the training time of hierarchical classification algorithms tends to be much greater than the training time of 'flat' classification algorithms. This increased training time is mainly due to the typically much greater number of class labels that annotate the instances of hierarchical classification problems.

We have tested the predictive power of our meta-learning system and interpreted some generated meta-models. We have concluded that our meta-learning system had good predictive performance when compared to other baseline meta-learning approaches. We have also concluded that the meta-rules generated by our meta-learning system were useful to identify dataset characteristics to assist the choice of hierarchical classification algorithm.

Finally, we have reviewed the current practice of applying supervised machine learning (classification and regression) algorithms to study the biology of ageing. This review discusses the main findings of such algorithms, in the context of the ageing biology literature. We have also interpreted some of the hierarchical classification models generated in our experiments. Both the above literature review and the interpretation of some models were performed in collaboration with an ageing expert, in order to extract relevant information for ageing research.

Item Type: Thesis (Doctor of Philosophy (PhD))
Thesis advisor: A. Freitas, Alex
DOI/Identification number: 10.22024/UniKent/01.02.63883
Uncontrolled keywords: Hierarchical Classification; Classification; Data Mining; Bioinformatics; Ageing; Predictive Clustering Tree; Dependency Network; Hierarchical Dependency Network; Classification Model Interpretation; Meta-learning
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Funders: [37325] UNSPECIFIED
SWORD Depositor: System Moodle
Depositing User: System Moodle
Date Deposited: 06 Oct 2017 13:54 UTC
Last Modified: 05 Nov 2024 10:59 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/63883 (The current URI for this page, for reference purposes)

University of Kent Author Information

  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.