Skip to main content

Why Current Statistical Approaches to Ransomware Detection Fail

Pont, Jamie, Arief, Budi, Hernandez-Castro, Julio C. (2020) Why Current Statistical Approaches to Ransomware Detection Fail. In: Lecture Notes in Computer Science. Information Security. 23rd International Conference, ISC 2020, Bali, Indonesia, December 16–18, 2020, Proceedings. 12472. Springer ISBN 978-3-030-62973-1. (doi:10.1007/978-3-030-62974-8_12) (KAR id:82960)

PDF Author's Accepted Manuscript
Language: English
Download (2MB) Preview
[thumbnail of why-current-statistical-approaches-to-ransomware-detection-fail.pdf]
This file may not be suitable for users of assistive technology.
Request an accessible format
Official URL:


The frequent use of basic statistical techniques to detect ransomware is a popular and intuitive strategy; statistical tests can be used to identify randomness, which in turn can indicate the presence of encryption and, by extension, a ransomware attack. However, common file formats such as images and compressed data can look random from the perspective of some of these tests. In this work, we investigate the current frequent use of statistical tests in the context of ransomware detection, primarily focusing on false positive rates. The main aim of our work is to show that the current over-dependence on simple statistical tests within anti-ransomware tools can cause serious issues with the reliability and consistency of ransomware detection in the form of frequent false classifications. We determined thresholds for five key statistics frequently used in detecting randomness, namely Shannon entropy, chi-square, arithmetic mean, Monte Carlo estimation for Pi and serial correlation coefficient. We obtained a large data set of 84,327 files comprising of images, compressed data and encrypted data. We then tested these thresholds (taken from a variety of previous publications in the literature where possible) against our dataset, showing that the rate of false positives is far beyond what could be considered acceptable. False positive rates were often above 50% and even above 90% on several occasions. False negative rates were also generally between 5% and 20%, numbers which are also far too high. As a direct result of these experiments, we determine that relying on these simple statistical approaches is not good enough to detect ransomware attacks consistently. We instead recommend the exploration of higher-order statistics such as skewness and kurtosis for future ransomware detection techniques.

Item Type: Conference or workshop item (Proceeding)
DOI/Identification number: 10.1007/978-3-030-62974-8_12
Uncontrolled keywords: ransomware, anti-ransomware, statistical tests, randomness, entropy, chi-square
Subjects: Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Jamie Pont
Date Deposited: 16 Sep 2020 12:12 UTC
Last Modified: 09 Dec 2022 04:10 UTC
Resource URI: (The current URI for this page, for reference purposes)
Pont, Jamie:
Arief, Budi:
Hernandez-Castro, Julio C.:
  • Depositors only (login required):


Downloads per month over past year