TestifAI: Tomography-Based Testing for Deep Learning Systems

Arif, Arooj, Hartung, Tobias, Botoeva, Elena, Koliousis, Alexandros (2026) TestifAI: Tomography-Based Testing for Deep Learning Systems. In: 48th IEEE/ACM International Conference on Software Engineering (ICSE 2026). (In press) (KAR id:114631)

PDF Author's Accepted Manuscript Language: English This work is licensed under a Creative Commons Attribution 4.0 International License.
Download this file (PDF/5MB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Additional URLs: Organisation

Abstract

As AI systems are increasingly deployed in safety-critical application domains (e.g., autonomous driving), associated risks increase too. Deep learning models underlying modern AI systems, therefore, must undergo thorough testing to ensure their correct behaviour. A single robustness test involves thousands of inferences to empirically verify if a model’s outputs remain stable under a bounded perturbation of its inputs. However, existing testing frameworks lack the means to systematically explore and summarise robustness across a combinatorial space of perturbations.

We propose TestifAI, a deep learning testing framework for efficient and accurate estimation of robustness against combinations of perturbations. TestifAI enables users to specify operational conditions as structured spaces of semantic input perturbations (e.g., image blur, brightness and zoom) and discrete severity levels (e.g., low, medium and high). Users can query model robustness for any combination (e.g., “low blur, high brightness, and medium zoom”). To achieve efficiency and accuracy, TestifAI introduces partial model tomography, a novel approach to reconstructing model behaviour in a multi-perturbation space from tests that apply only a small number of perturbations (lower-order projections). To estimate robustness against at least three perturbations, TestifAI trains an auxiliary model on the results of tests involving up to two perturbations only, avoiding execution of an exponential number of tests. Our experiments on five image and language classification tasks show that TestifAI can predict higher-order (3 and 4 perturbations) test outcomes from low-order (1 and 2 perturbations) observations with an aggregate robustness estimation error of less than 7%, while reducing the number of inferences by 60–80%.

Item Type:	Conference proceeding
Uncontrolled keywords:	Deep learning testing, Model robustness, AI safety, Combinatorial testing, Input perturbations
Subjects:	Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	There are no former institutional units.
Funders:	University of Kent (https://ror.org/00xkeyj56)
Depositing User:	Elena Botoeva
Date Deposited:	09 May 2026 13:06 UTC
Last Modified:	09 May 2026 13:07 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/114631 (The current URI for this page, for reference purposes)

University of Kent Author Information

Botoeva, Elena.

Creator's ORCID:	https://orcid.org/0000-0001-5881-0258
CReDIT Contributor Roles:

Depositors only (login required):

Total Views

Total unique views of this page since July 2020. For more details click on the image.