A Generalized Methodology for Data Analysis

Angelov, Plamen P., Gu, Xiaowei, Principe, Jose C. (2018) A Generalized Methodology for Data Analysis. IEEE Transactions on Cybernetics, 48 (10). pp. 2981-2993. ISSN 2168-2267. (doi:10.1109/TCYB.2017.2753880) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:90112)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided. (Contact us about this Publication)
Official URL: https://doi.org/10.1109/TCYB.2017.2753880

Abstract

Based on a critical analysis of data analytics and its foundations, we propose a functional approach to estimate data ensemble properties, which is based entirely on the empirical observations of discrete data samples and the relative proximity of these points in the data space and hence named empirical data analysis (EDA). The ensemble functions include the nonparametric square centrality (a measure of closeness used in graph theory) and typicality (an empirically derived quantity which resembles probability). A distinctive feature of the proposed new functional approach to data analysis is that it does not assume randomness or determinism of the empirically observed data, nor independence. The typicality is derived from the discrete data directly in contrast to the traditional approach, where a continuous probability density function is assumed a priori. The typicality is expressed in a closed analytical form that can be calculated recursively and, thus, is computationally very efficient. The proposed nonparametric estimators of the ensemble properties of the data can also be interpreted as a discrete form of the information potential (known from the information theoretic learning theory as well as the Parzen windows). Therefore, EDA is very suitable for the current move to a data-rich environment, where the understanding of the underlying phenomena behind the available vast amounts of data is often not clear. We also present an extension of EDA for inference. The areas of applications of the new methodology of the EDA are wide because it concerns the very foundation of data analysis. Preliminary tests show its good performance in comparison to traditional techniques.

Item Type:	Article
DOI/Identification number:	10.1109/TCYB.2017.2753880
Uncontrolled keywords:	Data analysis; Random variables; Probability density function; Cybernetics; Graph theory; Data mining; Data mining and analysis; machine learning; pattern recognition; probability; statistics
Subjects:	Q Science > QA Mathematics (inc Computing science)
Divisions:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Amy Boaler
Date Deposited:	09 Sep 2021 14:56 UTC
Last Modified:	05 Nov 2024 12:55 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/90112 (The current URI for this page, for reference purposes)

University of Kent Author Information

Gu, Xiaowei.

Creator's ORCID:
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.