Skip to main content

A data-driven missing value imputation approach for longitudinal datasets

Ribeiro, Caio, Freitas, Alex A. (2021) A data-driven missing value imputation approach for longitudinal datasets. Artificial Intelligence Review, . ISSN 0269-2821. E-ISSN 1573-7462. (doi:10.1007/s10462-021-09963-5) (KAR id:88186)

PDF Publisher pdf
Language: English


Download (1MB) Preview
[thumbnail of Ribeiro-Freitas2021_Article_AData-drivenMissingValueImputa.pdf]
Preview
This file may not be suitable for users of assistive technology.
Request an accessible format
PDF Author's Accepted Manuscript
Language: English

Restricted to Repository staff only
Contact us about this Publication
[thumbnail of AI-Review-2021-Ribeiro-post-review.pdf]
Official URL
https://doi.org/10.1007/s10462-021-09963-5

Abstract

Longitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.

Item Type: Article
DOI/Identification number: 10.1007/s10462-021-09963-5
Uncontrolled keywords: machine learning, data mining, missing values, longitudinal classification, longitudinal datasets
Subjects: Q Science > Q Science (General) > Q335 Artificial intelligence
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Alex Freitas
Date Deposited: 16 May 2021 10:01 UTC
Last Modified: 18 May 2021 12:06 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/88186 (The current URI for this page, for reference purposes)
Freitas, Alex A.: https://orcid.org/0000-0001-9825-4700
  • Depositors only (login required):

Downloads

Downloads per month over past year