Skip to main content
Kent Academic Repository

An Improved Model Selection Heuristic for AUC

Wu, Shaomin and Flach, Peter A. and Ferri, Cesar (2007) An Improved Model Selection Heuristic for AUC. In: Machine Learning: ECML 2007 18th European Conference on Machine Learning. Lecture Notes in Computer Science . Springer, Berlin, Germany, pp. 478-489. ISBN 978-3-540-74957-8. E-ISBN 978-3-540-74958-5. (doi:10.1007/978-3-540-74958-5_44) (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided) (KAR id:31016)

PDF Publisher pdf
Language: English

Restricted to Repository staff only
[thumbnail of 2000765.pdf]
Official URL:
http://dx.doi.org/10.1007/978-3-540-74958-5_44

Abstract

The area under the ROC curve (AUC) has been widely used to measure ranking performance for binary classification tasks. AUC only employs the classifier's scores to rank the test instances; thus, it ignores other valuable information conveyed by the scores, such as sensitivity to small differences in the score values. However, as such differences are inevitable across samples, ignoring them may lead to overfitting the validation set when selecting models with high AUC. This problem is tackled in this paper. On the basis of ranks as well as scores, we introduce a new metric called scored AUC (sAUC), which is the area under the sROC curve. The latter measures how quickly AUC deteriorates if positive scores are decreased. We study the interpretation and statistical properties of sAUC. Experimental results on UCI data sets convincingly demonstrate the effectiveness of the new metric for classifier evaluation and selection in the case of limited validation data. © Springer-Verlag Berlin Heidelberg 2007.

Item Type: Book section
DOI/Identification number: 10.1007/978-3-540-74958-5_44
Additional information: Unmapped bibliographic data: PY - 2007/// [EPrints field already has value set] AD - Cranfield University, United Kingdom [Field not mapped to EPrints] AD - University of Bristol, United Kingdom [Field not mapped to EPrints] AD - Universitat Politècnica de València, Spain [Field not mapped to EPrints] JA - Lect. Notes Comput. Sci. [Field not mapped to EPrints]
Uncontrolled keywords: Database systems, Heuristic methods, Mathematical models, Problem solving, Sensitivity analysis, Statistical methods, Binary classification tasks, Classifier evaluation, Data sets, Statistical properties, Classification (of information)
Subjects: H Social Sciences
H Social Sciences > HA Statistics > HA33 Management Science
Divisions: Divisions > Kent Business School - Division > Department of Analytics, Operations and Systems
Depositing User: Shaomin Wu
Date Deposited: 28 Sep 2012 15:26 UTC
Last Modified: 16 Nov 2021 10:08 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/31016 (The current URI for this page, for reference purposes)

University of Kent Author Information

  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.