Why Current Statistical Approaches to Ransomware Detection Fail

Pont, Jamie, Arief, Budi, Hernandez-Castro, Julio C. (2020) Why Current Statistical Approaches to Ransomware Detection Fail. In: Information Security. 23rd International Conference, ISC 2020, Bali, Indonesia, December 16–18, 2020, Proceedings. Springer ISBN 978-3-030-62973-1. (doi:10.1007/978-3-030-62974-8_12) (KAR id:82960)

PDF Author's Accepted Manuscript Language: English
Download this file (PDF/2MB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: https://doi.org/10.1007/978-3-030-62974-8_12

Abstract

The frequent use of basic statistical techniques to detect ransomware is a popular and intuitive strategy; statistical tests can be used to identify randomness, which in turn can indicate the presence of encryption and, by extension, a ransomware attack. However, common file formats such as images and compressed data can look random from the perspective of some of these tests. In this work, we investigate the current frequent use of statistical tests in the context of ransomware detection, primarily focusing on false positive rates. The main aim of our work is to show that the current over-dependence on simple statistical tests within anti-ransomware tools can cause serious issues with the reliability and consistency of ransomware detection in the form of frequent false classifications. We determined thresholds for five key statistics frequently used in detecting randomness, namely Shannon entropy, chi-square, arithmetic mean, Monte Carlo estimation for Pi and serial correlation coefficient. We obtained a large data set of 84,327 files comprising of images, compressed data and encrypted data. We then tested these thresholds (taken from a variety of previous publications in the literature where possible) against our dataset, showing that the rate of false positives is far beyond what could be considered acceptable. False positive rates were often above 50% and even above 90% on several occasions. False negative rates were also generally between 5% and 20%, numbers which are also far too high. As a direct result of these experiments, we determine that relying on these simple statistical approaches is not good enough to detect ransomware attacks consistently. We instead recommend the exploration of higher-order statistics such as skewness and kurtosis for future ransomware detection techniques.

Item Type:	Conference proceeding
DOI/Identification number:	10.1007/978-3-030-62974-8_12
Uncontrolled keywords:	ransomware, anti-ransomware, statistical tests, randomness, entropy, chi-square
Subjects:	Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Jamie Pont
Date Deposited:	16 Sep 2020 12:12 UTC
Last Modified:	28 Apr 2026 09:13 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/82960 (The current URI for this page, for reference purposes)

University of Kent Author Information

Pont, Jamie.

Creator's ORCID:	https://orcid.org/0000-0003-0969-2464
CReDIT Contributor Roles:

Arief, Budi.

Creator's ORCID:	https://orcid.org/0000-0002-1830-1587
CReDIT Contributor Roles:

Hernandez-Castro, Julio C..

Creator's ORCID:	https://orcid.org/0000-0002-6432-5328
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.