Skip to main content

An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features

Wan, Cen, Freitas, Alex A. (2017) An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features. Artificial Intelligence Review, 50 . pp. 201-240. ISSN 0269-2821. E-ISSN 1573-7462. (doi:10.1007/s10462-017-9541-y) (KAR id:61063)

PDF Author's Accepted Manuscript
Language: English
Download (544kB) Preview
[thumbnail of AI-Review-accepted-for-publication-Wan-Freitas.pdf]
Preview
This file may not be suitable for users of assistive technology.
Request an accessible format
Official URL
http://dx.doi.org/10.1007/s10462-017-9541-y

Abstract

Hierarchical feature selection is a new research area in machine learning/data mining, which consists of performing feature selection by exploiting dependency relationships among hierarchically structured features. This paper evaluates four hierarchical feature selection methods, i.e., HIP, MR, SHSEL and GTD, used together with four types of lazy learning-based classifiers, i.e., Naïve Bayes, Tree Augmented Naïve Bayes, Bayesian Network Augmented Naïve Bayes and k-Nearest Neighbors classifiers. These four hierarchical feature selection methods are compared with each other and with a well-known “flat” feature selection method, i.e., Correlation-based Feature Selection. The adopted bioinformatics datasets consist of aging-related genes used as instances and Gene Ontology terms used as hierarchical features. The experimental results reveal that the HIP (Select Hierarchical Information Preserving Features) method performs best overall, in terms of predictive accuracy and robustness when coping with data where the instances’ classes have a substantially imbalanced distribution. This paper also reports a list of the Gene Ontology terms that were most often selected by the HIP method.

Item Type: Article
DOI/Identification number: 10.1007/s10462-017-9541-y
Uncontrolled keywords: data mining, machine learning, classification, bioinformatics, gene ontology, feature selection
Subjects: Q Science > Q Science (General) > Q335 Artificial intelligence
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Alex Freitas
Date Deposited: 28 Mar 2017 12:03 UTC
Last Modified: 16 Feb 2021 13:44 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/61063 (The current URI for this page, for reference purposes)
Freitas, Alex A.: https://orcid.org/0000-0001-9825-4700
  • Depositors only (login required):

Downloads

Downloads per month over past year