Skip to main content
Kent Academic Repository

A Comprehensive Study on GDPR-Oriented Analysis of Privacy Policies: Taxonomy, Corpus and GDPR Concept Classifiers

Tang, Peng, Li, Xin, Chen, Yuxin, Qiu, Weidong, Mei, Haochen, Holmes, Allison, Li, Fenghua, Li, Shujun (2026) A Comprehensive Study on GDPR-Oriented Analysis of Privacy Policies: Taxonomy, Corpus and GDPR Concept Classifiers. IEEE Transactions on Dependable and Secure Computing, . ISSN 1545-5971. (In press) (doi:10.1109/TDSC.2026.3677283) (KAR id:115473)

Abstract

Machine learning (ML) based classifiers that take a privacy policy as the input and predict relevant concepts are useful in different applications such as (semi-)automated compliance analysis against requirements of a specific data protection law such as the EU GDPR. Although many researchers have studied ML-based privacy policy concept classifiers, we observed multiple research gaps, e.g., the lack of a more complete GDPR taxonomy and the less consideration of hierarchical information in privacy policies. To fill such research gaps, we produced a more complete GDPR-oriented privacy policy concept taxonomy, constructed the first privacy policy corpus with explicitly hierarchical information at three levels, and conducted the most comprehensive performance evaluation study of GDPR concept classifiers for privacy policies, cover many aspects that have not been studied systematically. Our work led to multiple findings and insights, including the usefulness of considering hierarchical contextual features and different hierarchical structures, the observation that a “one size fits all” approach may not work, the reduced performance of such classifiers on our newly constructed corpus especially after the first level, and the necessity to split the training and testing sets by documents.

Item Type: Article
DOI/Identification number: 10.1109/TDSC.2026.3677283
Uncontrolled keywords: GDPR, taxonomy, privacy policy, corpus, machine learning, legal compliance, concept classifier
Subjects: H Social Sciences > HF Commerce > HF5548.32 E-commerce
K Law > K Law (General)
Q Science > QA Mathematics (inc Computing science)
T Technology > TK Electrical engineering. Electronics. Nuclear engineering > TK5101 Telecommunications > TK5105.888 World Wide Web
Institutional Unit: Schools > School of Computing
Institutes > Institute of Cyber Security for Society
Former Institutional Unit:
There are no former institutional units.
Depositing User: Shujun Li
Date Deposited: 26 May 2026 21:21 UTC
Last Modified: 27 May 2026 02:43 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/115473 (The current URI for this page, for reference purposes)

University of Kent Author Information

  • Depositors only (login required):

Total unique views of this page since July 2020. For more details click on the image.