When graph convolution meets double attention: online privacy disclosure detection with multi-label text classification

Liang, Zhanbo, Guo, Jie, Qiu, Weidong, Huang, Zheng, Li, Shujun (2024) When graph convolution meets double attention: online privacy disclosure detection with multi-label text classification. Data Mining and Knowledge Discovery, 38 (3). pp. 1171-1192. ISSN 1384-5810. (doi:10.1007/s10618-023-00992-y) (KAR id:104513)

PDF Publisher pdf Language: English This work is licensed under a Creative Commons Attribution 4.0 International License.
Download this file (PDF/1MB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: https://doi.org/10.1007/s10618-023-00992-y

Abstract

With the rise of Web 2.0 platforms such as online social media, people’s private information, such as their location, occupation and even family information, is often inadvertently disclosed through online discussions. Therefore, it is important to detect such unwanted privacy disclosures to help alert people affected and the online platform. In this paper, privacy disclosure detection is modeled as a multi-label text classification (MLTC) problem, and a new privacy disclosure detection model is proposed to construct an MLTC classifier for detecting online privacy disclosures. This classifier takes an online post as the input and outputs multiple labels, each reflecting a possible privacy disclosure. The proposed presentation method combines three different sources of information, the input text itself, the label-to-text correlation and the label-to-label correlation. A double-attention mechanism is used to combine the first two sources of information, and a graph convolutional network is employed to extract the third source of information that is then used to help fuse features extracted from the first two sources of information. Our extensive experimental results, obtained on a public dataset of privacy-disclosing posts on Twitter, demonstrated that our proposed privacy disclosure detection method significantly and consistently outperformed other state-of-the-art methods in terms of all key performance indicators.

Item Type:	Article
DOI/Identification number:	10.1007/s10618-023-00992-y
Additional information:	For the purpose of open access, the author(s) has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.
Uncontrolled keywords:	Online social media, Privacy disclosure detection, User generated content (UGC), Multi-label text classification (MLTC), Graph convolutional network (GCN)
Subjects:	Q Science > QA Mathematics (inc Computing science) > QA 75 Electronic computers. Computer science Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, > QA76.76.E95 Expert Systems (Intelligent Knowledge Based Systems) Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, > QA76.87 Neural computers, neural networks T Technology > TK Electrical engineering. Electronics. Nuclear engineering > TK5101 Telecommunications > TK5105 Data transmission systems > TK5105.5 Computer networks > TK5105.875.I57 Internet T Technology > TK Electrical engineering. Electronics. Nuclear engineering > TK5101 Telecommunications > TK5105.888 World Wide Web
Divisions:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing University-wide institutes > Institute of Cyber Security for Society
Funders:	Engineering and Physical Sciences Research Council (https://ror.org/0439y7842) National Natural Science Foundation of China (https://ror.org/01h0zpd94)
Depositing User:	Shujun Li
Date Deposited:	05 Jan 2024 19:33 UTC
Last Modified:	05 Nov 2024 13:10 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/104513 (The current URI for this page, for reference purposes)

University of Kent Author Information

Li, Shujun.

Creator's ORCID:	https://orcid.org/0000-0001-5628-7328
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.