Skip to main content
Kent Academic Repository

New Statistical Approaches to Estimating Mixture Models with Application in Anti-Cancer Drug Studies

Wang, Tong (2022) New Statistical Approaches to Estimating Mixture Models with Application in Anti-Cancer Drug Studies. Doctor of Philosophy (PhD) thesis, University of Kent,. (doi:10.22024/UniKent/01.02.99500) (KAR id:99500)

Abstract

When confronted with applications to real data problems, it is always challenging to simultaneously deal with potential group structures, high-dimensional features and the relationship between predictors and response variables. Most of the time missing data exist across the whole dataset, which makes the problems even more tricky. Meanwhile, with the advent of big data and high-throughput technology, the dimension of the given data could easily exceed the sample size, which places the ordinary linear regression into a difficult position where the normal equation is degenerate and traditional statistical techniques cannot be used properly. Notwithstanding, generally speaking, there is only a small part of variables being informative to the needs of researchers by significantly affecting the dependent variables. To address these issues, we develop a model to realise classification, variable selection and parameter estimation simultaneously in this thesis. This model also shows flexibility and inclusiveness to datasets with missingness. Moreover, by introducing the l_{q}-norm penalty to tune the sparsity level to the specific needs of researchers, our methodology has been improved further.

With the help of Bayesian Information Criterion, we can specify the number of components and degree of penalty for this modelling. After that, the uses of marginal analysis and the k-means clustering method facilitate the following application to whole datasets by realising a dimension reduction purpose. In the application to the anti-cancer drug and screened gene expression data, our methodology shows good abilities for clustering drugs into a finite number of groups and screening out the related genes which play significant roles in configuring the corresponding groups. With our specific enhancements to the model, including missingness indication and adjustable sparsity level, our methodology has the potential to be applied to a wide range of datasets in the scientific area, including but not limited to economics, finance, biology, and physics. Based on the above applications, we also propose another method to determine the number of components in a mixture model, which provides an alternative view on the clustering problem.

Afterwards, we examine the inherent skewness of given data by resorting to skew normal distributions. After adaptations to the traditional skew normal density function, we successfully estimate the parameters in a skew normal distribution under different skewness scenarios. The asymptotic distributions for the MLE estimates of our skew normal distribution are also obtained with detailed proofs attached in the Appendix. Meanwhile, some intriguing asymptotic properties behind our skew normal function are discussed later in this chapter. Lastly, we propose the four-piece distribution family for skew normal mixture models to consider the group structure, which shows a good estimation accuracy in the following simulation studies. From these simulations, the above models have been verified as a complement to the existing R package mclust which is popular for handling model-based clustering, classification, and density estimation problems.

Item Type: Thesis (Doctor of Philosophy (PhD))
Thesis advisor: Zhang, Jian
Thesis advisor: Bentham, James
DOI/Identification number: 10.22024/UniKent/01.02.99500
Subjects: Q Science > QA Mathematics (inc Computing science)
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Mathematics, Statistics and Actuarial Science
SWORD Depositor: System Moodle
Depositing User: System Moodle
Date Deposited: 12 Jan 2023 15:10 UTC
Last Modified: 05 Nov 2024 13:05 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/99500 (The current URI for this page, for reference purposes)

University of Kent Author Information

Wang, Tong.

Creator's ORCID:
CReDIT Contributor Roles:
  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.