Comparing enrichment analysis and machine learning for identifying gene properties that discriminate between gene classes

Fabris, Fabio, Palmer, Daniel, de Magalhaes, João Pedro, Freitas, Alex A. (2020) Comparing enrichment analysis and machine learning for identifying gene properties that discriminate between gene classes. Briefings in Bioinformatics, 21 (3). pp. 803-814. ISSN 1477-4054. (doi:10.1093/bib/bbz028) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:73238)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided.

Official URL: https://doi.org/10.1093/bib/bbz028

Abstract

Biologists very often use enrichment methods based on statistical hypothesis tests to identify gene properties that are significantly over-represented in a given set of genes of interest, by comparison with a ‘background’ set of genes. These enrichment methods, although based on rigorous statistical foundations, are not always the best single option to identify patterns in biological data. In many cases, one can also use classification algorithms from the machine-learning field. Unlike enrichment methods, classification algorithms are designed to maximize measures of predictive performance and are capable of analysing combinations of gene properties, instead of one property at a time. In practice, however, the majority of studies use either enrichment or classification methods (rather than both), and there is a lack of literature discussing the pros and cons of both types of method. The goal of this paper is to compare and contrast enrichment and classification methods, offering two contributions. First, we discuss the (to some extent complementary) advantages and disadvantages of both types of methods for identifying gene properties that discriminate between gene classes. Second, we provide a set of high-level recommendations for using enrichment and classification methods. Overall, by highlighting the strengths and the weaknesses of both types of methods we argue that both should be used in bioinformatics analyses.

Item Type:	Article
DOI/Identification number:	10.1093/bib/bbz028
Uncontrolled keywords:	machine learning; enrichment analysis; classification; statistical hypothesis testing
Subjects:	Q Science
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Fabio Fabris
Date Deposited:	27 Mar 2019 14:03 UTC
Last Modified:	28 Apr 2026 08:59 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/73238 (The current URI for this page, for reference purposes)

University of Kent Author Information

Fabris, Fabio.

Creator's ORCID:	https://orcid.org/0000-0001-7159-4668
CReDIT Contributor Roles:

Freitas, Alex A..

Creator's ORCID:	https://orcid.org/0000-0001-9825-4700
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.