An Analysis of Sentence Boundary Detection Systems for English and Portuguese Documents

Silla Jr, Carlos N. and Kaestner, Celso A.A. (2004) An Analysis of Sentence Boundary Detection Systems for English and Portuguese Documents. In: Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science, 2945. Springer pp. 135-141. ISBN 3-540-21006-7. (Full text available)

PDF
Download (87kB)
[img]
Preview

Abstract

In this paper we present a study comparing the performance of different systems found in the literature that perform the task of automatic text segmentation in sentences for English documents. We also show the difficulties found to adapt these systems to make them work with Portuguese documents and the results obtained after the adaptation. We analyzed two systems that use a machine learning approach: MxTerminator and Satz, and a customized system based on fixed rules expressed by Regular Expressions. The results achieved by the Satz system were surprisingly positive for Portuguese documents.

Item Type: Conference or workshop item (Paper)
Subjects: Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Divisions: Faculties > Science Technology and Medical Studies > School of Computing > Applied and Interdisciplinary Informatics Group
Depositing User: Mark Wheadon
Date Deposited: 29 Mar 2010 12:16
Last Modified: 18 Jul 2012 08:41
Resource URI: http://kar.kent.ac.uk/id/eprint/24119 (The current URI for this page, for reference purposes)
  • Depositors only (login required):

Downloads

Downloads per month over past year