Skip to main content

Predicting the pro-longevity or anti-longevity effect of model organism genes with new hierarchical feature selection methods

Wan, Cen, Freitas, Alex A., de Magalhaes, João Pedro (2015) Predicting the pro-longevity or anti-longevity effect of model organism genes with new hierarchical feature selection methods. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12 (2). pp. 262-275. ISSN 1545-5963. (doi:10.1109/TCBB.2014.2355218) (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided) (KAR id:47996)

PDF
Language: English

Restricted to Repository staff only
[thumbnail of IEEE-TCBB-Wan-2015-As-Publ.pdf]
Official URL:
http://dx.doi.org/10.1109/TCBB.2014.2355218

Abstract

Ageing is a highly complex biological process that is still poorly understood. With the growing amount of ageing-related data available on the web, in particular concerning the genetics of ageing, it is timely to apply data mining methods to that data, in order to try to discover novel patterns that may assist ageing research. In this work, we introduce new hierarchical feature selection methods for the classification task of data mining and apply them to ageing-related data from four model organisms: Caenorhabditis elegans (worm), Saccharomyces cerevisiae (yeast), Drosophila melanogaster (fly), and Mus musculus (mouse). The main novel aspect of the proposed feature selection methods is that they exploit hierarchical relationships in the set of features (Gene Ontology terms) in order to improve the predictive accuracy of the Naïve Bayes and 1-Nearest Neighbour (1-NN) classifiers, which are used to classify model organisms’ genes into pro-longevity or anti-longevity genes. The results show that our hierarchical feature selection methods, when used together with Naïve Bayes and 1-NN classifiers, obtain higher predictive accuracy than the standard (without feature selection) Naïve Bayes and 1-NN classifiers, respectively. We also discuss the biological relevance of a number of Gene Ontology terms very frequently selected by our algorithms in our datasets.

Item Type: Article
DOI/Identification number: 10.1109/TCBB.2014.2355218
Uncontrolled keywords: data mining, machine learning, hierarchical feature selection, ageing, bioinformatics
Subjects: Q Science > Q Science (General) > Q335 Artificial intelligence
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Alex Freitas
Date Deposited: 17 Apr 2015 15:17 UTC
Last Modified: 17 Aug 2022 10:58 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/47996 (The current URI for this page, for reference purposes)

University of Kent Author Information

Wan, Cen.

Creator's ORCID:
CReDIT Contributor Roles:

Freitas, Alex A..

Creator's ORCID: https://orcid.org/0000-0001-9825-4700
CReDIT Contributor Roles:
  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.