A 'non-parametric' version of the naive Bayes classifier

Soria, Daniele, Garibaldi, Jonathan M., Ambrogi, Federico, Biganzoli, Elia M., Ellis, Ian O. (2011) A 'non-parametric' version of the naive Bayes classifier. Knowledge-Based Systems, 24 (6). pp. 775-784. ISSN 0950-7051. E-ISSN 1872-7409. (doi:10.1016/j.knosys.2011.02.014) (KAR id:98900)

PDF Author's Accepted Manuscript Language: English This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Download this file (PDF/1MB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: https://doi.org/10.1016/j.knosys.2011.02.014

Abstract

Many algorithms have been proposed for the machine learning task of classification. One of the simplest methods, the naive Bayes classifier, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-normal distributions are observed. Â© 2011 Elsevier B.V. All rights reserved.

Item Type:	Article
DOI/Identification number:	10.1016/j.knosys.2011.02.014
Uncontrolled keywords:	Supervised learning, Naive Bayes, logistic regression, Breast cancer, UCI data sets
Subjects:	Q Science > QA Mathematics (inc Computing science)
Divisions:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Funders:	University of Nottingham (https://ror.org/01ee9ar58)
Depositing User:	Daniel Soria
Date Deposited:	08 Dec 2022 10:22 UTC
Last Modified:	05 Nov 2024 13:04 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/98900 (The current URI for this page, for reference purposes)

University of Kent Author Information

Soria, Daniele.

Creator's ORCID:	https://orcid.org/0000-0002-0164-8218
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.