Skip to main content

New KEGG pathway-based interpretable features for classifying ageing-related mouse proteins

Fabris, Fabio, Freitas, Alex A. (2016) New KEGG pathway-based interpretable features for classifying ageing-related mouse proteins. Bioinformatics, 32 (19). pp. 2988-2995. ISSN 1367-4803. (doi:10.1093/bioinformatics/btw363) (KAR id:56869)

PDF Author's Accepted Manuscript
Language: English
Click to download this file (347kB)
[thumbnail of Bioinformatics-J-2016-Fabris-online.pdf]
This file may not be suitable for users of assistive technology.
Request an accessible format
Official URL:


Motivation: The incidence of ageing-related diseases has been constantly increasing in the last decades, raising the need for creating effective methods to analyze ageing-related protein data. These methods should have high predictive accuracy and be easily interpretable by ageing experts. To enable this, one needs interpretable classification models (supervised machine learning) and features with rich biological meaning. In this paper we propose two interpretable feature types based on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and compare them with traditional feature types in hierarchical classification (a more challenging classification task regarding predictive performance) and binary classification (a classification task producing easier to interpret classification models). As far as we know, this work is the first to: (i) explore the potential of the KEGG pathway data in the hierarchical classification setting, (i) use the graph structure of KEGG pathways to create a feature type that quantifies the influence of a current protein on another specific protein within a KEGG pathway graph and (iii) propose a method for interpreting the classification models induced using KEGG features.

Results: We performed tests measuring predictive accuracy considering hierarchical and binary class labels extracted from the Mouse Phenotype Ontology. One of the KEGG feature types leads to the highest predictive accuracy among five individual feature types across three hierarchical classification algorithms. Additionally, the combination of the two KEGG feature types proposed in this work results in one of the best predictive accuracies when using the binary class version of our datasets, at the same time enabling the extraction of knowledge from ageing-related data using quantitative influence information.

Item Type: Article
DOI/Identification number: 10.1093/bioinformatics/btw363
Uncontrolled keywords: data mining, machine learning, classification, bioinformatics, ageing, Computational Intelligence Group
Subjects: Q Science > Q Science (General) > Q335 Artificial intelligence
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Alex Freitas
Date Deposited: 17 Aug 2016 17:12 UTC
Last Modified: 09 Dec 2022 04:31 UTC
Resource URI: (The current URI for this page, for reference purposes)
Fabris, Fabio:
Freitas, Alex A.:
  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.