Neto, Joel Larocca and Santos, Alexandre D. and Kaestner, Celso A.A. and Freitas, Alex A. (2000) Document Clustering and Text Summarization. In: Mackin, N., ed. Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining. The Practical Application Company, pp. 41-55. ISBN 1-902426-08-8. (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:21916)
The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided. |
Abstract
This paper describes a text mining tool that performs two tasks, namely document clustering and text summarization. These tasks have, of course, their corresponding counterpart in “conventional” data mining. However, the textual, unstructured nature of documents makes these two text mining tasks considerably more difficult than their data mining counterparts. In our system document clustering is performed by using the Autoclass data mining algorithm. Our text summarization algorithm is based on computing the value of aTF-ISF(term frequency – inverse sentence frequency) measure for each word, which is anadaptation of the conventional TF-IDF (term frequency – inverse document frequency)measure of information retrieval. Sentences with high values of TF-ISF are selected to producea summary of the source text. The system has been evaluated on real-world documents, and the results are satisfactory
Item Type: | Book section |
---|---|
Subjects: | Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming, |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Depositing User: | Mark Wheadon |
Date Deposited: | 27 Oct 2009 13:04 UTC |
Last Modified: | 05 Nov 2024 10:00 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/21916 (The current URI for this page, for reference purposes) |
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):