Skip to main content
Kent Academic Repository

Improving Performance of Automatic Keyword Extraction (AKE) Methods Using PoS Tagging and Enhanced Semantic-Awareness

Altuncu, Enes, Nurse, Jason R. C., Xu, Yang, Guo, Jie, Li, Shujun (2025) Improving Performance of Automatic Keyword Extraction (AKE) Methods Using PoS Tagging and Enhanced Semantic-Awareness. Information, 16 (7). 601:1-601:22. ISSN 2078-2489. (doi:10.3390/info16070601) (KAR id:110621)

Abstract

Automatic keyword extraction (AKE) has gained more importance with the increasing amount of digital textual data that modern computing systems process. It has various applications in information retrieval (IR) and natural language processing (NLP), including text summarisation, topic analysis and document indexing. This paper proposes a simple but effective post-processing-based universal approach to improving the performance of any AKE methods, via an enhanced level of semantic-awareness supported by PoS tagging. To demonstrate the performance of the proposed approach, we considered word types retrieved from a PoS tagging step and two representative sources of semantic information—specialised terms defined in one or more context-dependent thesauri, and named entities in Wikipedia. The above three steps can be simply added to the end of any AKE methods as part of a post-processor, which simply re-evaluates all candidate keywords following some context-specific and semantic-aware criteria. For five state-of-the-art (SOTA) AKE methods, our experimental results with 17 selected datasets showed that the proposed approach improved their performances both consistently (up to 100% in terms of improved cases) and significantly (between 10.2% and 53.8%, with an average of 25.8%, in terms of F1-score and across all five methods), especially when all the three enhancement steps are used. Our results have profound implications considering the fact that our proposed approach can be easily applied to any AKE method with the standard output (candidate keywords and scores) and the ease to further extend it.

Item Type: Article
DOI/Identification number: 10.3390/info16070601
Uncontrolled keywords: keyword extraction; POS tagging; semantic-awareness; context-awareness
Subjects: Q Science > QA Mathematics (inc Computing science)
Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, > QA76.76.E95 Expert Systems (Intelligent Knowledge Based Systems)
T Technology > TK Electrical engineering. Electronics. Nuclear engineering > TK7800 Electronics > TK7880 Applications of electronics > TK7882.P3 Pattern recognition systems
T Technology > TK Electrical engineering. Electronics. Nuclear engineering > TK7800 Electronics > TK7880 Applications of electronics > TK7885 Computer engineering. Computer hardware
Institutional Unit: Schools > School of Computing
Institutes > Institute of Cyber Security for Society
Former Institutional Unit:
There are no former institutional units.
Funders: University of Kent (https://ror.org/00xkeyj56)
Depositing User: Shujun Li
Date Deposited: 13 Jul 2025 10:09 UTC
Last Modified: 22 Jul 2025 09:23 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/110621 (The current URI for this page, for reference purposes)

University of Kent Author Information

  • Depositors only (login required):

Total unique views of this page since July 2020. For more details click on the image.