Thijs, Bart and Glänzel, Wolfgang and Meyer, Martin S. (2017) Improved lexical similarities for hybrid clustering through the use of noun phrases extraction. Working paper. University of Leuven, Leuven (KAR id:60719)
PDF (Working Paper)
Publisher pdf
Language: English |
|
Download this file (PDF/1MB) |
Preview |
Request a format suitable for use with assistive technology e.g. a screenreader | |
Official URL: https://lirias.kuleuven.be/bitstream/123456789/572... |
Abstract
Clustering of hybrid document networks combining citation based links with lexical similarities suffered for a long time from the different properties of these underlying networks. In this paper we evaluate different processing options of noun phrases extracted from abstracts using natural language processing to improve the measurement of the lexical component. Term shingles of different length are created from each of the extracted noun phrases. We discuss twenty different extraction-shingling scenarios and compare their results. Some scenarios show no improvement compared with the previously used single term lexical approach used so far. But when all single term shingles are removed from the dataset the lexical network has properties which are comparable with those from a bibliographic coupling based network. Next, hybrid networks are built based on weighted combination of the two types of similarities with seven different weights. We demonstrate that removing all single term shingles provides the best results at the level of computational feasibility, comparability with bibliographic coupling and also in a community detection application.
Item Type: | Reports and Papers (Working paper) |
---|---|
Subjects: |
H Social Sciences Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science |
Divisions: | Divisions > Kent Business School - Division > Kent Business School (do not use) |
Depositing User: | Martin Meyer |
Date Deposited: | 04 Mar 2017 22:27 UTC |
Last Modified: | 05 Nov 2024 10:54 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/60719 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):