Skip to main content

Combining clustering and classification ensembles: A novel pipeline to identify breast cancer profiles

Agrawal, Utkarsh, Soria, Daniele, Wagner, Christian, Garibaldi, Jonathan, Ellis, Ian O., Bartlett, John M.S., Cameron, David, Rakha, Emad A., Green, Andrew R. (2019) Combining clustering and classification ensembles: A novel pipeline to identify breast cancer profiles. Artificial Intelligence in Medicine, 97 . pp. 27-37. ISSN 0933-3657. (doi:10.1016/j.artmed.2019.05.002) (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided)

PDF - Author's Accepted Manuscript
Restricted to Repository staff only until 15 May 2020.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Contact us about this Publication Download (703kB)
[img]
Official URL
https://dx.doi.org/10.1016/j.artmed.2019.05.002

Abstract

Breast Cancer is one of the most common causes of cancer death in women, representing a very complex disease with varied molecular alterations. To assist breast cancer prognosis, the classification of patients into biological groups is of great significance for treatment strategies. Recent studies have used an ensemble of multiple clustering algorithms to elucidate the most characteristic biological groups of breast cancer. However, the combination of various clustering methods resulted in a number of patients remaining unclustered. Therefore, a framework still needs to be developed which can assign as many unclustered (i.e. biologically diverse) patients to one of the identified groups in order to improve classification. Therefore, in this paper we develop a novel classification framework which introduces a new ensemble classification stage after the ensemble clustering stage to target the unclustered patients. Thus, a step-by-step pipeline is introduced which couples ensemble clustering with ensemble classification for the identification of core groups, data distribution in them and improvement in final classification results by targeting the unclustered data. The proposed pipeline is employed on a novel real world breast cancer dataset and subsequently its robustness and stability are examined by testing it on standard datasets. The results show that by using the presented framework, an improved classification is obtained. Finally, the results have been verified using statistical tests, visualisation techniques, cluster quality assessment and interpretation from clinical experts.

Item Type: Article
DOI/Identification number: 10.1016/j.artmed.2019.05.002
Uncontrolled keywords: Ensemble clustering; Ensemble classification; Class level fusion; Refining cluster results; Breast cancer; Pipeline
Subjects: R Medicine > R Medicine (General) > R858 Computer applications to medicine. Medical informatics. Medical information technology
Divisions: Faculties > Sciences > School of Computing
Depositing User: Daniele Soria
Date Deposited: 12 Sep 2019 07:51 UTC
Last Modified: 16 Jan 2020 09:24 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/76421 (The current URI for this page, for reference purposes)
Soria, Daniele: https://orcid.org/0000-0002-0164-8218
  • Depositors only (login required):

Downloads

Downloads per month over past year