Arif, Arooj, Hartung, Tobias, Botoeva, Elena, Koliousis, Alexandros (2026) TestifAI: Tomography-Based Testing for Deep Learning Systems. In: 48th IEEE/ACM International Conference on Software Engineering (ICSE 2026). (In press) (KAR id:114631)
|
PDF
Author's Accepted Manuscript
Language: English
This work is licensed under a Creative Commons Attribution 4.0 International License.
|
|
|
Download this file (PDF/5MB) |
Preview |
| Request a format suitable for use with assistive technology e.g. a screenreader | |
| Additional URLs: |
|
Abstract
As AI systems are increasingly deployed in safety-critical application domains (e.g., autonomous driving), associated risks increase too. Deep learning models underlying modern AI systems, therefore, must undergo thorough testing to ensure their correct behaviour. A single robustness test involves thousands of inferences to empirically verify if a model’s outputs remain stable under a bounded perturbation of its inputs. However, existing testing frameworks lack the means to systematically explore and summarise robustness across a combinatorial space of perturbations.
We propose TestifAI, a deep learning testing framework for efficient and accurate estimation of robustness against combinations of perturbations. TestifAI enables users to specify operational conditions as structured spaces of semantic input perturbations (e.g., image blur, brightness and zoom) and discrete severity levels (e.g., low, medium and high). Users can query model robustness for any combination (e.g., “low blur, high brightness, and medium zoom”). To achieve efficiency and accuracy, TestifAI introduces partial model tomography, a novel approach to reconstructing model behaviour in a multi-perturbation space from tests that apply only a small number of perturbations (lower-order projections). To estimate robustness against at least three perturbations, TestifAI trains an auxiliary model on the results of tests involving up to two perturbations only, avoiding execution of an exponential number of tests. Our experiments on five image and language classification tasks show that TestifAI can predict higher-order (3 and 4 perturbations) test outcomes from low-order (1 and 2 perturbations) observations with an aggregate robustness estimation error of less than 7%, while reducing the number of inferences by 60–80%.
| Item Type: | Conference proceeding |
|---|---|
| Uncontrolled keywords: | Deep learning testing, Model robustness, AI safety, Combinatorial testing, Input perturbations |
| Subjects: | Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, |
| Institutional Unit: | Schools > School of Computing |
| Former Institutional Unit: |
There are no former institutional units.
|
| Funders: | University of Kent (https://ror.org/00xkeyj56) |
| Depositing User: | Elena Botoeva |
| Date Deposited: | 09 May 2026 13:06 UTC |
| Last Modified: | 09 May 2026 13:07 UTC |
| Resource URI: | https://kar.kent.ac.uk/id/eprint/114631 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):

https://orcid.org/0000-0001-5881-0258
Total Views
Total Views