Jolliffe, Ian T. and Jones, B. and Morgan, Byron J. T. (1995) Identifying Influencial Observations in Hierarchical Cluster-Analysis. Journal of Applied Statistics, 22 (1). pp. 61-80. ISSN 0266-4763. (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided)
In a cluster analysis of a multivariate data set, it may happen that one or two observations have a disproportionately large effect on the analysis, in the sense that their removal causes a dramatic change to the results. Tt is important to be able to identify such influential observations, and the present paper addresses this problem. To do so, we must first quantify the effect of a single observation. Various definitions are discussed, and criteria for identifying influential observations are investigated; the minimum spanning tree and the number of neighbours of each observation are considered. The investigation concentrates on single-link cluster analysis, although complete-link analysis is also briefly discussed. Patterns emerge in both real and simulated data, which suggest ways of predicting observations with no effect and those with the greatest effect. It is not necessary to recalculate the results with each observation omitted-an economy of presentation as well as labour.
|Subjects:||H Social Sciences > HA Statistics|
|Divisions:||Faculties > Science Technology and Medical Studies > School of Mathematics Statistics and Actuarial Science > Statistics|
|Depositing User:||P. Ogbuji|
|Date Deposited:||29 May 2009 08:34|
|Last Modified:||10 Jul 2014 15:33|
|Resource URI:||https://kar.kent.ac.uk/id/eprint/19626 (The current URI for this page, for reference purposes)|