Kirk, Paul, Griffin, Jim E., Savage, Richard S., Ghahramani, Zoubin, Wild, David L. (2012) Bayesian correlated clustering of integrated multiple datasets. Bioinformatics, 28 (24). pp. 3290-3297. ISSN 1367-4803. (doi:10.1093/bioinformatics/bts595) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:41218)
| The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided. | |
| Official URL: http://bioinformatics.oxfordjournals.org/content/2... |
|
Abstract
Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets.
Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.
| Item Type: | Article |
|---|---|
| DOI/Identification number: | 10.1093/bioinformatics/bts595 |
| Subjects: | Q Science > QA Mathematics (inc Computing science) > QA276 Mathematical statistics |
| Institutional Unit: | Schools > School of Engineering, Mathematics and Physics > Mathematical Sciences |
| Former Institutional Unit: |
Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Mathematics, Statistics and Actuarial Science
|
| Funders: | Engineering and Physical Sciences Research Council (https://ror.org/0439y7842) |
| Depositing User: | Jim Griffin |
| Date Deposited: | 29 May 2014 15:26 UTC |
| Last Modified: | 20 May 2025 11:36 UTC |
| Resource URI: | https://kar.kent.ac.uk/id/eprint/41218 (The current URI for this page, for reference purposes) |
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):

https://orcid.org/0000-0002-4828-7368
Altmetric
Altmetric