Skip to main content
Kent Academic Repository

Exponential power mixture model for regression : estimation and variable selection

Cao, Lini (2014) Exponential power mixture model for regression : estimation and variable selection. Doctor of Philosophy (PhD) thesis, University of Kent. (doi:10.22024/UniKent/01.02.94262) (KAR id:94262)

Abstract

The mixture regression model is an important technique used in statistical modelling to investigate the relationship between variables. It has been applied in many fields such as genetics, finance and biology. In this research, we focus on its application to genetic data. As we know gene expression data normally contains unknown correlation structures even after normalization, hence it raises a great challenge for the existing clustering methods such as the Gaussian mixture(GM) model and k-mean. Here we use the exponential power distribution to robustly overcome the clustering of gene expression data by treating the data as a mixture of regression. The exponential power distribution (EPD) is a scale mixture of Gaussian distributions that has varying shape parameters. In this study we introduce and develop our method based on two different aspects of multiple regression with random errors distributed according to the exponential power distribution. The first aspect is estimation: we use both the ExpectationMaximisation algorithm (EM) and the Newton-Raphson method to estimate the parameters of the exponential power distribution mixture regression models. The second aspect is simultaneous variable selection and clustering: we develop a LASSO-type method to select only the related variables in a large dataset, especially for a high dimensional dataset. The novelty of this research regarding to the Expectation-Maximization algorithm is that we convert each penalised mixture regression estimation problem to a LASSO (Least absolute shrinkage and selection operator) problem. The performance of our method is assessed on both independent and dependent data. We also compared the EPD mixture regression with Gaussian mixture regressions by simulations and real data analyses. We also derive the model selection criteria such as AIC, BIC and EBIC for both EPD mixture and GM models.

Item Type: Thesis (Doctor of Philosophy (PhD))
DOI/Identification number: 10.22024/UniKent/01.02.94262
Additional information: This thesis has been digitised by EThOS, the British Library digitisation service, for purposes of preservation and dissemination. It was uploaded to KAR on 25 April 2022 in order to hold its content and record within University of Kent systems. It is available Open Access using a Creative Commons Attribution, Non-commercial, No Derivatives (https://creativecommons.org/licenses/by-nc-nd/4.0/) licence so that the thesis and its author, can benefit from opportunities for increased readership and citation. This was done in line with University of Kent policies (https://www.kent.ac.uk/is/strategy/docs/Kent%20Open%20Access%20policy.pdf). If you feel that your rights are compromised by open access to this thesis, or if you would like more information about its availability, please contact us at ResearchSupport@kent.ac.uk and we will seriously consider your claim under the terms of our Take-Down Policy (https://www.kent.ac.uk/is/regulations/library/kar-take-down-policy.html).
Uncontrolled keywords: Genetics--Mathematical models ; Gene expression
Subjects: Q Science > QA Mathematics (inc Computing science)
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Mathematics, Statistics and Actuarial Science
SWORD Depositor: SWORD Copy
Depositing User: SWORD Copy
Date Deposited: 14 Jul 2023 10:09 UTC
Last Modified: 14 Jul 2023 10:09 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/94262 (The current URI for this page, for reference purposes)

University of Kent Author Information

  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.