Prioritizing positive feature values: a new hierarchical feature selection method

da Silva, Pablo Nasciemento, Plastino, Alexandre, Freitas, Alex A. (2020) Prioritizing positive feature values: a new hierarchical feature selection method. Applied Intelligence, . ISSN 0924-669X. E-ISSN 1573-7497. (doi:10.1007/s10489-020-01782-5) (KAR id:82231)

PDF Author's Accepted Manuscript Language: English
Download this file (PDF/667kB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: https://dx.doi.org/10.1007/s10489-020-01782-5

Abstract

In this work, we address the problem of feature selection for the classification task in hierarchical and sparse feature spaces, which characterise many real-world applications nowadays. A binary feature space is deemed hierarchical when its binary features are related via generalization-specialization relationships, and is considered sparse when in general the instances contain much fewer “positive” than “negative” feature values. In any given instance, a feature value is deemed positive (negative) when the property associated with the feature has been (has not been) observed for that instance. Although there are many methods for the traditional feature selection problem in the literature, the proper treatment to hierarchical feature structures is still a

challenge. Hence, we introduce a novel hierarchical feature selection method that follows the lazy learning paradigm – selecting a feature subset tailored for each instance in the test set. Our strategy prioritizes the selection of features with positive values, since they tend to be more informative – the presence of a relatively rare property is usually a piece of more relevant information than the absence of that property. Experiments on different application domains have shown that the proposed method outperforms previous hierarchical feature selection methods and also traditional methods in terms of predictive accuracy, selecting smaller feature subsets in general.

Item Type:	Article
DOI/Identification number:	10.1007/s10489-020-01782-5
Uncontrolled keywords:	machine learning, data mining, classification, feature selection
Subjects:	Q Science > Q Science (General) > Q335 Artificial intelligence
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Alex Freitas
Date Deposited:	25 Jul 2020 12:06 UTC
Last Modified:	28 Apr 2026 09:12 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/82231 (The current URI for this page, for reference purposes)

University of Kent Author Information

Freitas, Alex A..

Creator's ORCID:	https://orcid.org/0000-0001-9825-4700
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.