Oftadeh, Elaheh (2017) Complex Modelling of Multi-Outcome Data with Applications to Cancer Biology. Doctor of Philosophy (PhD) thesis, University of Kent,. (KAR id:65697)
PDF
Language: English |
|
Download this file (PDF/4MB) |
Preview |
Abstract
In applied scientific areas such as economics, finance, biology, and medicine, it is often required to find the relationship between a set of independent variables (predictors) and a set of response variables (i.e., outcomes of an experiment). If we model individual outcomes separately, we potentially miss information of the correlation among outcomes. Therefore, it is desirable to model these outcomes simultaneously by multivariate linear regressions.
With the advent of high-throughput technology, there is an enormous amount of high dimensional multivariate regression data being generated at an extraordinary speed. However, only a small proportion of them are informative. This has imposed a challenge on modern statistics because of this high dimensionality. In this work, we propose methods and algorithms for modelling high-dimensional multivariate regression data. The contributions of this thesis are as follows.
Firstly, we propose two variable screening techniques to reduce the high dimension of predictors. One is a beamforming-based screening method which is based on a statistic called SNR. The second approach is a mixture-based screening where the screening is conducted through the so-called likelihood fusion.
Secondly, we propose a variable selection method called principal variable analysis (PVA). In PVA we take into account the correlation between response variables in the process of variable selection. We compare PVA with some of well-known variable selection methods by simulation studies, showing that PVA can substantially enhance the selection accuracy.
Thirdly, we develop a method for clustering and variable selection simultaneously, by using the likelihood fusion. We show the feature of the proposed method by simulation studies.
Fourthly, we study a Bayesian clustering problem through the mixture of normal distributions where we propose mixing-proportion dependent priors for component parameters.
Finally, we apply the proposed methods to cancer drug data. This data contain expression levels of 13321 genes across 42 cell lines and the responses of these cell lines to 131 drugs, recorded as fifty percent inhibitory concentration (IC50) values. We identify 37 genes which are important for predicting IC50 values. We found that although the expressions of these genes are weakly correlated, they are highly correlated in terms of their regression coefficients. We also identify a regression coefficient-based network between genes. We also show that 34 out of 37 selected genes have played certain roles in at least one type of cancer.
Moreover, by applying the likelihood fusion model to real data we classify the drugs into five groups.
Item Type: | Thesis (Doctor of Philosophy (PhD)) |
---|---|
Thesis advisor: | Zhang, Jian |
Thesis advisor: | Villa, Cristiano |
Uncontrolled keywords: | Multivariate variable selection, Variable screening, Multivariate regressions, Dimension reduction |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Mathematics, Statistics and Actuarial Science |
Funders: | [37325] UNSPECIFIED |
SWORD Depositor: | System Moodle |
Depositing User: | System Moodle |
Date Deposited: | 09 Jan 2018 16:10 UTC |
Last Modified: | 05 Nov 2024 11:03 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/65697 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):