Bayesian correlated clustering of integrated multiple datasets

Kirk, Paul, Griffin, Jim E., Savage, Richard S., Ghahramani, Zoubin, Wild, David L. (2012) Bayesian correlated clustering of integrated multiple datasets. Bioinformatics, 28 (24). pp. 3290-3297. ISSN 1367-4803. (doi:10.1093/bioinformatics/bts595) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:41218)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided.
Official URL: http://bioinformatics.oxfordjournals.org/content/2...

Abstract

Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets.

Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.

Item Type:	Article
DOI/Identification number:	10.1093/bioinformatics/bts595
Subjects:	Q Science > QA Mathematics (inc Computing science) > QA276 Mathematical statistics
Divisions:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Mathematics, Statistics and Actuarial Science
Funders:	Engineering and Physical Sciences Research Council (https://ror.org/0439y7842)
Depositing User:	Jim Griffin
Date Deposited:	29 May 2014 15:26 UTC
Last Modified:	12 Jul 2022 10:40 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/41218 (The current URI for this page, for reference purposes)

University of Kent Author Information

Griffin, Jim E..

Creator's ORCID:	https://orcid.org/0000-0002-4828-7368
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.