Improved lexical similarities for hybrid clustering through the use of noun phrases extraction

Thijs, Bart and Glänzel, Wolfgang and Meyer, Martin S. (2017) Improved lexical similarities for hybrid clustering through the use of noun phrases extraction. Working paper. University of Leuven, Leuven

PDF (Working Paper) - Publisher pdf
Download (512kB) Preview
[img]
Preview
Official URL
https://lirias.kuleuven.be/bitstream/123456789/572...

Abstract

Clustering of hybrid document networks combining citation based links with lexical similarities suffered for a long time from the different properties of these underlying networks. In this paper we evaluate different processing options of noun phrases extracted from abstracts using natural language processing to improve the measurement of the lexical component. Term shingles of different length are created from each of the extracted noun phrases. We discuss twenty different extraction-shingling scenarios and compare their results. Some scenarios show no improvement compared with the previously used single term lexical approach used so far. But when all single term shingles are removed from the dataset the lexical network has properties which are comparable with those from a bibliographic coupling based network. Next, hybrid networks are built based on weighted combination of the two types of similarities with seven different weights. We demonstrate that removing all single term shingles provides the best results at the level of computational feasibility, comparability with bibliographic coupling and also in a community detection application.

Item Type: Monograph (Working paper)
Subjects: H Social Sciences
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Divisions: Faculties > Social Sciences > Kent Business School
Depositing User: Martin Meyer
Date Deposited: 04 Mar 2017 22:27 UTC
Last Modified: 29 May 2019 18:46 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/60719 (The current URI for this page, for reference purposes)
Meyer, Martin S.: https://orcid.org/0000-0002-5598-9480
  • Depositors only (login required):

Downloads

Downloads per month over past year