Skip to main content
Kent Academic Repository

Document Clustering and Text Summarization

Neto, Joel Larocca and Santos, Alexandre D. and Kaestner, Celso A.A. and Freitas, Alex A. (2000) Document Clustering and Text Summarization. In: Mackin, N., ed. Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining. The Practical Application Company, pp. 41-55. ISBN 1-902426-08-8. (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:21916)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided.

Abstract

This paper describes a text mining tool that performs two tasks, namely document clustering and text summarization. These tasks have, of course, their corresponding counterpart in “conventional” data mining. However, the textual, unstructured nature of documents makes these two text mining tasks considerably more difficult than their data mining counterparts. In our system document clustering is performed by using the Autoclass data mining algorithm. Our text summarization algorithm is based on computing the value of aTF-ISF(term frequency – inverse sentence frequency) measure for each word, which is anadaptation of the conventional TF-IDF (term frequency – inverse document frequency)measure of information retrieval. Sentences with high values of TF-ISF are selected to producea summary of the source text. The system has been evaluated on real-world documents, and the results are satisfactory

Item Type: Book section
Subjects: Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Mark Wheadon
Date Deposited: 27 Oct 2009 13:04 UTC
Last Modified: 05 Nov 2024 10:00 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/21916 (The current URI for this page, for reference purposes)

University of Kent Author Information

  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.