Wang, Tong (2022) New Statistical Approaches to Estimating Mixture Models with Application in Anti-Cancer Drug Studies. Doctor of Philosophy (PhD) thesis, University of Kent,. (doi:10.22024/UniKent/01.02.99500) (KAR id:99500)
PDF
Language: English
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
|
|
Download this file (PDF/2MB) |
Preview |
Official URL: https://doi.org/10.22024/UniKent/01.02.99500 |
Abstract
When confronted with applications to real data problems, it is always challenging to simultaneously deal with potential group structures, high-dimensional features and the relationship between predictors and response variables. Most of the time missing data exist across the whole dataset, which makes the problems even more tricky. Meanwhile, with the advent of big data and high-throughput technology, the dimension of the given data could easily exceed the sample size, which places the ordinary linear regression into a difficult position where the normal equation is degenerate and traditional statistical techniques cannot be used properly. Notwithstanding, generally speaking, there is only a small part of variables being informative to the needs of researchers by significantly affecting the dependent variables. To address these issues, we develop a model to realise classification, variable selection and parameter estimation simultaneously in this thesis. This model also shows flexibility and inclusiveness to datasets with missingness. Moreover, by introducing the l_{q}-norm penalty to tune the sparsity level to the specific needs of researchers, our methodology has been improved further.
With the help of Bayesian Information Criterion, we can specify the number of components and degree of penalty for this modelling. After that, the uses of marginal analysis and the k-means clustering method facilitate the following application to whole datasets by realising a dimension reduction purpose. In the application to the anti-cancer drug and screened gene expression data, our methodology shows good abilities for clustering drugs into a finite number of groups and screening out the related genes which play significant roles in configuring the corresponding groups. With our specific enhancements to the model, including missingness indication and adjustable sparsity level, our methodology has the potential to be applied to a wide range of datasets in the scientific area, including but not limited to economics, finance, biology, and physics. Based on the above applications, we also propose another method to determine the number of components in a mixture model, which provides an alternative view on the clustering problem.
Afterwards, we examine the inherent skewness of given data by resorting to skew normal distributions. After adaptations to the traditional skew normal density function, we successfully estimate the parameters in a skew normal distribution under different skewness scenarios. The asymptotic distributions for the MLE estimates of our skew normal distribution are also obtained with detailed proofs attached in the Appendix. Meanwhile, some intriguing asymptotic properties behind our skew normal function are discussed later in this chapter. Lastly, we propose the four-piece distribution family for skew normal mixture models to consider the group structure, which shows a good estimation accuracy in the following simulation studies. From these simulations, the above models have been verified as a complement to the existing R package mclust which is popular for handling model-based clustering, classification, and density estimation problems.
Item Type: | Thesis (Doctor of Philosophy (PhD)) |
---|---|
Thesis advisor: | Zhang, Jian |
Thesis advisor: | Bentham, James |
DOI/Identification number: | 10.22024/UniKent/01.02.99500 |
Subjects: | Q Science > QA Mathematics (inc Computing science) |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Mathematics, Statistics and Actuarial Science |
SWORD Depositor: | System Moodle |
Depositing User: | System Moodle |
Date Deposited: | 12 Jan 2023 15:10 UTC |
Last Modified: | 05 Nov 2024 13:05 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/99500 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):