Skip to main content

Predicting the pro-longevity or anti-longevity effect of model organism genes with new hierarchical feature selection methods

Wan, Cen, Freitas, Alex A., de Magalhaes, João Pedro (2015) Predicting the pro-longevity or anti-longevity effect of model organism genes with new hierarchical feature selection methods. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12 (2). pp. 262-275. ISSN 1545-5963. (doi:10.1109/TCBB.2014.2355218) (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided)

Abstract

Ageing is a highly complex biological process that is still poorly understood. With the growing amount of ageing-related data available on the web, in particular concerning the genetics of ageing, it is timely to apply data mining methods to that data, in order to try to discover novel patterns that may assist ageing research. In this work, we introduce new hierarchical feature selection methods for the classification task of data mining and apply them to ageing-related data from four model organisms: Caenorhabditis elegans (worm), Saccharomyces cerevisiae (yeast), Drosophila melanogaster (fly), and Mus musculus (mouse). The main novel aspect of the proposed feature selection methods is that they exploit hierarchical relationships in the set of features (Gene Ontology terms) in order to improve the predictive accuracy of the Naïve Bayes and 1-Nearest Neighbour (1-NN) classifiers, which are used to classify model organisms’ genes into pro-longevity or anti-longevity genes. The results show that our hierarchical feature selection methods, when used together with Naïve Bayes and 1-NN classifiers, obtain higher predictive accuracy than the standard (without feature selection) Naïve Bayes and 1-NN classifiers, respectively. We also discuss the biological relevance of a number of Gene Ontology terms very frequently selected by our algorithms in our datasets.

Item Type: Article
DOI/Identification number: 10.1109/TCBB.2014.2355218
Uncontrolled keywords: data mining, machine learning, hierarchical feature selection, ageing, bioinformatics
Subjects: Q Science > Q Science (General) > Q335 Artificial intelligence
Divisions: Faculties > Sciences > School of Computing > Computational Intelligence Group
Depositing User: Alex Freitas
Date Deposited: 17 Apr 2015 15:17 UTC
Last Modified: 29 May 2019 14:26 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/47996 (The current URI for this page, for reference purposes)
  • Depositors only (login required):

Downloads

Downloads per month over past year