de Sá, Alex G. C., Pappa, Gisele L., Freitas, Alex A., Ascher, D.B. (2025) Interpreting machine learning pipelines produced by evolutionary AutoML for biochemical property prediction. In: GECCO'25 Companion: Proceedings of the 2025 Genetic and Evolutionary Computation Conference Companion. . pp. 1944-1952. ACM ISBN 979-8-4007-1464-1. (doi:10.1145/3712255.3734339) (KAR id:110941)
|
PDF
Author's Accepted Manuscript
Language: English |
|
|
Download this file (PDF/5MB) |
Preview |
| Request a format suitable for use with assistive technology e.g. a screenreader | |
| Official URL: https://doi.org/10.1145/3712255.3734339 |
|
Abstract
Machine learning (ML) has been playing a crucial role in drug discovery, mainly through quantitative structure-activity relationship models that relate molecular structures to properties, such as absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties. However, traditional ML approaches often lack customisation to a particular biochemical task and fail to generalise to new biochemical spaces, resulting in reduced predictive performance. Automated machine learning (AutoML) has emerged to address these limitations by automatically selecting the suitable ML pipelines for a given input dataset. Despite its potential, AutoML is underutilised in cheminformatics, and its decisions often lack interpretability, reducing user trust - especially among non-experts. Accordingly, this paper proposes an evolutionary AutoML method for biochemical property prediction that outputs an interpretable model for understanding the evolved ML pipelines. It combines grammar-based genetic programming with Bayesian networks to guide search and enhance the searched pipelines' interpretability. The evaluation on 12 benchmark ADMET datasets showed that the proposed AutoML method obtained similar or better results than three existing methods. Additionally, the interpretable Bayesian network identified, among the ML pipelines' components generated by the AutoML method (i.e. components like biochemical feature extraction methods, preprocessing techniques and ML algorithms), which components affect the ML pipelines' predictive performance.
| Item Type: | Conference or workshop item (Proceeding) |
|---|---|
| DOI/Identification number: | 10.1145/3712255.3734339 |
| Uncontrolled keywords: | supervised machine learning, classification, evolutionary algorithms, estimation of distribution algorithms, bioinformatics |
| Subjects: | Q Science > Q Science (General) > Q335 Artificial intelligence |
| Institutional Unit: | Schools > School of Computing |
| Former Institutional Unit: |
There are no former institutional units.
|
| Funders: | University of Kent (https://ror.org/00xkeyj56) |
| Depositing User: | Alex Freitas |
| Date Deposited: | 13 Aug 2025 09:04 UTC |
| Last Modified: | 14 Aug 2025 14:54 UTC |
| Resource URI: | https://kar.kent.ac.uk/id/eprint/110941 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):

https://orcid.org/0000-0001-9825-4700
Altmetric
Altmetric