Skip to main content
Kent Academic Repository

Bayesian Nonparametric Methods for Cyber Security with Applications to Malware Detection and Classification

Perusquia Cortes, Jose Antonio (2022) Bayesian Nonparametric Methods for Cyber Security with Applications to Malware Detection and Classification. Doctor of Philosophy (PhD) thesis, University of Kent,. (doi:10.22024/UniKent/01.02.93553) (KAR id:93553)


The statistical approach to cyber security has become an active and important area of research due to the growth in number and threat of cyber attacks perpetrated nowadays. In this thesis, we centre our attention on the Bayesian approach to cyber security, which provides several modelling advantages such as the flexibility achieved through the probabilistic quantification of uncertainty. In particular, we have found that Bayesian models have been mainly used to detect volume-traffic anomalies, network anomalies and malicious software. To provide a unifying view of these ideas, we first present a thorough review on Bayesian methods applied to cyber security.

Bayesian models applied to detecting malware and classifying them into known malicious classes is one of the cyber security areas discussed in our review. However, and contrary to detecting traffic and network anomalies, this area has not been widely developed from a Bayesian perspective. That is why we have centred our attention on developing novel supervised learning Bayesian nonparametric models to detect and classify malware using binary features built directly from the executables’ binary code. For these methods, important theoretical properties and simulation techniques are fully developed and for real malware data, we have compared their performance against well-known machine learning models which have been widely applied in this area.

With respect to our methodologies, we first present a new discrete nonparametric prior specifically designed for binary data that builds on an elegant nonparametric hierarchical structure, which allows us to study the importance of each individual feature across the groups found in the data. Moreover, and due to the large, and possibly redundant, number of features, we have developed a generalised version of the model that allows the introduction of a feature selection step within the inferential learning. Finally, for a more complex modelling where there is a need to introduce dependence across the features, we have extended the capabilities of this new class of nonparametric priors by using it as the building block of a latent feature model.

Item Type: Thesis (Doctor of Philosophy (PhD))
DOI/Identification number: 10.22024/UniKent/01.02.93553
Subjects: H Social Sciences > HA Statistics
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Mathematics, Statistics and Actuarial Science
SWORD Depositor: System Moodle
Depositing User: System Moodle
Date Deposited: 14 Mar 2022 09:23 UTC
Last Modified: 15 Mar 2022 16:09 UTC
Resource URI: (The current URI for this page, for reference purposes)

University of Kent Author Information

Perusquia Cortes, Jose Antonio.

Creator's ORCID:
CReDIT Contributor Roles:
  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.