Tang, Peng, Li, Xin, Chen, Yuxin, Qiu, Weidong, Mei, Haochen, Holmes, Allison, Li, Fenghua, Li, Shujun (2026) A Comprehensive Study on GDPR-Oriented Analysis of Privacy Policies: Taxonomy, Corpus and GDPR Concept Classifiers. IEEE Transactions on Dependable and Secure Computing, . ISSN 1545-5971. (In press) (doi:10.1109/TDSC.2026.3677283) (KAR id:115473)
|
PDF
Author's Accepted Manuscript
Language: English
This work is licensed under a Creative Commons Attribution 4.0 International License.
|
|
|
Download this file (PDF/5MB) |
Preview |
| Request a format suitable for use with assistive technology e.g. a screenreader | |
| Official URL: https://doi.org/10.1109/TDSC.2026.3677283 |
|
| Additional URLs: |
|
Abstract
Machine learning (ML) based classifiers that take a privacy policy as the input and predict relevant concepts are useful in different applications such as (semi-)automated compliance analysis against requirements of a specific data protection law such as the EU GDPR. Although many researchers have studied ML-based privacy policy concept classifiers, we observed multiple research gaps, e.g., the lack of a more complete GDPR taxonomy and the less consideration of hierarchical information in privacy policies. To fill such research gaps, we produced a more complete GDPR-oriented privacy policy concept taxonomy, constructed the first privacy policy corpus with explicitly hierarchical information at three levels, and conducted the most comprehensive performance evaluation study of GDPR concept classifiers for privacy policies, cover many aspects that have not been studied systematically. Our work led to multiple findings and insights, including the usefulness of considering hierarchical contextual features and different hierarchical structures, the observation that a “one size fits all” approach may not work, the reduced performance of such classifiers on our newly constructed corpus especially after the first level, and the necessity to split the training and testing sets by documents.
| Item Type: | Article |
|---|---|
| DOI/Identification number: | 10.1109/TDSC.2026.3677283 |
| Uncontrolled keywords: | GDPR, taxonomy, privacy policy, corpus, machine learning, legal compliance, concept classifier |
| Subjects: |
H Social Sciences > HF Commerce > HF5548.32 E-commerce K Law > K Law (General) Q Science > QA Mathematics (inc Computing science) T Technology > TK Electrical engineering. Electronics. Nuclear engineering > TK5101 Telecommunications > TK5105.888 World Wide Web |
| Institutional Unit: |
Schools > School of Computing Institutes > Institute of Cyber Security for Society |
| Former Institutional Unit: |
There are no former institutional units.
|
| Depositing User: | Shujun Li |
| Date Deposited: | 26 May 2026 21:21 UTC |
| Last Modified: | 27 May 2026 02:43 UTC |
| Resource URI: | https://kar.kent.ac.uk/id/eprint/115473 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):

https://orcid.org/0000-0001-8940-798X
Altmetric
Altmetric