Skip to main content

A Novel Feature Selection Method for Uncertain Features: An Application to the Prediction of Pro-/Anti- Longevity Genes

Da Silva, Pablo Nascimento, Plastino, Alexandre, Fabris, Fabio, Freitas, Alex A. (2020) A Novel Feature Selection Method for Uncertain Features: An Application to the Prediction of Pro-/Anti- Longevity Genes. IEEE/ACM Transactions on Computational Biology and Bioinformatics, . ISSN 1545-5963. (doi:10.1109/TCBB.2020.2988450) (KAR id:81027)

PDF Author's Accepted Manuscript
Language: English
Download (4MB) Preview
[thumbnail of IEEE-TCCB-2020-Pablo-Accepted.pdf]
Preview
This file may not be suitable for users of assistive technology.
Request an accessible format
Official URL
https://doi.org/10.1109/TCBB.2020.2988450

Abstract

Understanding the ageing process is a very challenging problem for biologists. To help in this task, there has been a growing use of classification methods (from machine learning) to learn models that predict whether a gene influences the process of ageing or promotes longevity. One type of predictive feature often used for learning such classification models is Protein-Protein Interaction (PPI) features. One important property of PPI features is their uncertainty, i.e., a given feature (PPI annotation) is often associated with a confidence score, which is usually ignored by conventional classification methods. Hence, we propose the Lazy Feature Selection for Uncertain Features (LFSUF) method, which is tailored for coping with the uncertainty in PPI confidence scores. In addition, following the lazy learning paradigm, LFSUF selects features for each instance to be classified, making the feature selection process more flexible. We show that our LFSUF method achieves better predictive accuracy when compared to other feature selection methods that either do not explicitly take PPI confidence scores into account or deal with uncertainty globally rather than using a per-instance approach. Also, we interpret the results of the classification process using the features selected by LFSUF, showing that the number of selected features is significantly reduced, assisting the interpretability of the results. The datasets used in the experiments and the program code of the LFSUF method are freely available on the web at http://github.com/pablonsilva/FSforUncertainFeatureSpaces.

Item Type: Article
DOI/Identification number: 10.1109/TCBB.2020.2988450
Uncontrolled keywords: Ageing, Classification, Feature Selection, Uncertain Features, Gene Ontology, Protein-Protein Interaction
Subjects: Q Science
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Alex Freitas
Date Deposited: 28 Apr 2020 10:06 UTC
Last Modified: 16 Feb 2021 14:12 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/81027 (The current URI for this page, for reference purposes)
Fabris, Fabio: https://orcid.org/0000-0001-7159-4668
Freitas, Alex A.: https://orcid.org/0000-0001-9825-4700
  • Depositors only (login required):

Downloads

Downloads per month over past year