Skip to main content

Document Clustering and Text Summarization

Neto, Joel Larocca and Santos, Alexandre D. and Kaestner, Celso A.A. and Freitas, Alex A. (2000) Document Clustering and Text Summarization. In: Mackin, N., ed. Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining. The Practical Application Company, pp. 41-55. ISBN 1-902426-08-8. (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:21916)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided. (Contact us about this Publication)

Abstract

This paper describes a text mining tool that performs two tasks, namely document clustering and text summarization. These tasks have, of course, their corresponding counterpart in “conventional” data mining. However, the textual, unstructured nature of documents makes these two text mining tasks considerably more difficult than their data mining counterparts. In our system document clustering is performed by using the Autoclass data mining algorithm. Our text summarization algorithm is based on computing the value of aTF-ISF(term frequency – inverse sentence frequency) measure for each word, which is anadaptation of the conventional TF-IDF (term frequency – inverse document frequency)measure of information retrieval. Sentences with high values of TF-ISF are selected to producea summary of the source text. The system has been evaluated on real-world documents, and the results are satisfactory

Item Type: Book section
Subjects: Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Divisions: Faculties > Sciences > School of Computing > Applied and Interdisciplinary Informatics Group
Depositing User: Mark Wheadon
Date Deposited: 27 Oct 2009 13:04 UTC
Last Modified: 04 Feb 2020 04:03 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/21916 (The current URI for this page, for reference purposes)
Freitas, Alex A.: https://orcid.org/0000-0001-9825-4700
  • Depositors only (login required):