Identifying Influencial Observations in Hierarchical Cluster-Analysis

Jolliffe, I.T. and Jones, B. and Morgan, B.J.T. (1995) Identifying Influencial Observations in Hierarchical Cluster-Analysis. Journal of Applied Statistics, 22 (1). pp. 61-80. ISSN 0266-4763. (The full text of this publication is not available from this repository)

The full text of this publication is not available from this repository. (Contact us about this Publication)
Official URL
http://dx.doi.org/10.1080/757584398

Abstract

In a cluster analysis of a multivariate data set, it may happen that one or two observations have a disproportionately large effect on the analysis, in the sense that their removal causes a dramatic change to the results. Tt is important to be able to identify such influential observations, and the present paper addresses this problem. To do so, we must first quantify the effect of a single observation. Various definitions are discussed, and criteria for identifying influential observations are investigated; the minimum spanning tree and the number of neighbours of each observation are considered. The investigation concentrates on single-link cluster analysis, although complete-link analysis is also briefly discussed. Patterns emerge in both real and simulated data, which suggest ways of predicting observations with no effect and those with the greatest effect. It is not necessary to recalculate the results with each observation omitted-an economy of presentation as well as labour.

Item Type: Article
Subjects: H Social Sciences > HA Statistics
Divisions: Faculties > Science Technology and Medical Studies > School of Mathematics Statistics and Actuarial Science > Statistics
Depositing User: P. Ogbuji
Date Deposited: 29 May 2009 08:34
Last Modified: 02 Jun 2009 21:29
Resource URI: http://kar.kent.ac.uk/id/eprint/19626 (The current URI for this page, for reference purposes)
  • Depositors only (login required):