Maia, Marcelo Rodrigues de Holanda, Plastino, Alexandre, Freitas, Alex A., de Magalhaes, Joao Pedro (2022) Interpretable Ensembles of Classifiers for Uncertain Data with Bioinformatics Applications. IEEE/ACM Transactions on Computational Biology and Bioinformatics, . pp. 1-12. ISSN 1557-9964. (doi:10.1109/tcbb.2022.3218588) (KAR id:98040)
PDF
Author's Accepted Manuscript
Language: English |
|
Download this file (PDF/3MB) |
Preview |
Request a format suitable for use with assistive technology e.g. a screenreader | |
Official URL: https://doi.org/10.1109/tcbb.2022.3218588 |
Abstract
Data uncertainty remains a challenging issue in many applications, but few classification algorithms can effectively cope with it. An ensemble approach for uncertain categorical features has recently been proposed, achieving promising results. It consists in biasing the sampling of features for each model in an ensemble so that less uncertain features are more likely to be sampled. Here we extend this idea of biased sampling and propose two new approaches: one for selecting training instances for each model in an ensemble and another for sampling features to be considered when splitting a node in a Random Forest training. We applied these approaches to classify ageing-related genes and predict drugs' side effects based on uncertain features representing protein-protein and protein-chemical interactions. We show that ensembles based on our proposed approaches achieve better predictive performance. In particular, our proposed approaches improved the performance of a Random Forest based on the most sophisticated approach for handling uncertain data in ensembles of this kind. Furthermore, we propose two new approaches for interpreting an ensemble of Naive Bayes classifiers and analyse their results on our datasets of ageing-related genes and drug's side effects.
Item Type: | Article |
---|---|
DOI/Identification number: | 10.1109/tcbb.2022.3218588 |
Additional information: | ** Article version: VoR ** From Crossref journal articles via Jisc Publications Router ** Licence for VoR version of this article starting on 01-01-2022: https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
Uncontrolled keywords: | Applied Mathematics, Genetics, Biotechnology |
Subjects: | Q Science > QA Mathematics (inc Computing science) |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Funders: | University of Kent (https://ror.org/00xkeyj56) |
SWORD Depositor: | JISC Publications Router |
Depositing User: | JISC Publications Router |
Date Deposited: | 30 Nov 2022 14:49 UTC |
Last Modified: | 05 Nov 2024 13:03 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/98040 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):