Skip to main content
Kent Academic Repository

A study on the statistical evaluation of classifiers

Neumann, Nadine M., Plastino, Alexandre, Pinto Junior, Jony A., Freitas, Alex A. (2020) A study on the statistical evaluation of classifiers. Knowledge Engineering Review, 36 (e1). pp. 1-26. ISSN 0269-8889. E-ISSN 1469-8005. (doi:10.1017/S0269888920000417) (KAR id:87091)


Statistical significance analysis, based on hypothesis tests, is a common approach for comparing classifiers. However, many studies oversimplify this analysis by simply checking the condition p-value < 0.05, ignoring important concepts such as the effect size and the statistical power of the test. This problem is so worrying that the American Statistical Association has taken a strong stand on the subject, noting that although the p-value is a useful statistical measure, it has been abusively used and misinterpreted. This work highlights problems caused by the misuse of hypothesis tests and shows how the effect size and the power of the test can provide important information for better decision-making. To investigate these issues, we perform empirical studies with different classifiers and 50 datasets, using the Student’s t-test and the Wilcoxon test to compare classifiers. The results show that an isolated p-value analysis can lead to wrong conclusions and that the evaluation of the effect size and the power of the test contributes to a more principled decision-making.

Item Type: Article
DOI/Identification number: 10.1017/S0269888920000417
Uncontrolled keywords: data mining, machine learning, classification, statistical significance test
Subjects: Q Science > Q Science (General) > Q335 Artificial intelligence
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Alex Freitas
Date Deposited: 13 Mar 2021 15:58 UTC
Last Modified: 04 Mar 2024 15:21 UTC
Resource URI: (The current URI for this page, for reference purposes)

University of Kent Author Information

  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.