Skip to main content


Pomsuwan, Tossapol (2017) FEATURE SELECTION FOR THE CLASSIFICATION OF LONGITUDINAL HUMAN AGEING DATA. Master of Research (MRes) thesis, University of Kent,. (KAR id:66568)

Language: English

Download (1MB) Preview
[thumbnail of 168Tossapol-MRes-Thesis-After-Exam-05.pdf]
This file may not be suitable for users of assistive technology.
Request an accessible format


We address the feature selection task in the special context of longitudinal data - where variables are repeatedly measured across different time points. When analysing longitudinal data, a standard feature selection method would typically ignore the temporal nature of the features and treat each feature value at a given time point as a separate feature. That is, a standard algorithm would ignore the important difference between values of the same feature (measuring the same property of an instance) across different time points and values of fundamentally different features (measuring different properties of an instance) at the same time point. This thesis presents two main contributions. The first one is the creation of the longitudinal datasets used in the experiments, including the construction of features capturing longitudinal information for predicting age-related diseases. The datasets were created from data in the English Longitudinal Study of Ageing (ELSA) database. The second contribution consists of proposing four new variants of the Correlation-based Feature Selection (CFS) method for selecting features to be used as input by a classification algorithm. These CFS variants take into account (in different ways) the temporal redundancy associated with variations in the value of a feature across different time points. The results are summarised from two main perspectives. Firstly, in terms of predictive accuracy, one of the proposed CFS variants (called Exh-CFS-Gr - exhaustive search-based CFS per group of temporally redundant features) obtained a statistically significantly better predictive performance than the performance obtained by standard CFS and the baseline approach of no feature selection when using Nai?ve Bayes as the classification algorithm. However, there was no statistically significant difference between the predictive accuracies obtained by J48, a decision tree induction algorithm, for all different variants of CFS (including standard CFS). Secondly, regarding the feature subsets selected by different variants of CFS, the number of features selected by Exh-CFS-Gr was substantially greater than that of all other three CFS variants for all datasets. This helps explaining why this feature selection method obtained the best results in the experiments with Nai?ve Bayes; i.e., it seems that the other CFS variants selected relatively too few features for Nai?ve Bayes. Additionally, the features originally observed in the ELSA database were, in general, selected more often (by all variants of CFS) than the constructed features capturing longitudinal information.

Item Type: Thesis (Master of Research (MRes))
Thesis advisor: Freitas, Alex
Uncontrolled keywords: classification, feature selection, longitudinal data, age-related diseases
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
SWORD Depositor: System Moodle
Depositing User: System Moodle
Date Deposited: 28 Mar 2018 11:10 UTC
Last Modified: 16 Feb 2021 13:53 UTC
Resource URI: (The current URI for this page, for reference purposes)
  • Depositors only (login required):