An Analysis of Sentence Boundary Detection Systems for English and Portuguese Documents

Silla Jr, Carlos N. and Kaestner, Celso A.A. (2004) An Analysis of Sentence Boundary Detection Systems for English and Portuguese Documents. In: Computational Linguistics and Intelligent Text Processing 5th International Conference. Lecture Notes in Computer Science . Springer, Berlin, Germany, pp. 135-141. ISBN 978-3-540-21006-1. E-ISBN 978-3-540-24630-5. (doi:10.1007/b95558) (KAR id:24119)

PDF Language: English
Download this file (PDF/112kB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: https://doi.org/10.1007/b95558

Abstract

In this paper we present a study comparing the performance of different systems found in the literature that perform the task of automatic text segmentation in sentences for English documents. We also show the difficulties found to adapt these systems to make them work with Portuguese documents and the results obtained after the adaptation. We analyzed two systems that use a machine learning approach: MxTerminator and Satz, and a customized system based on fixed rules expressed by Regular Expressions. The results achieved by the Satz system were surprisingly positive for Portuguese documents.

Item Type:	Book section
DOI/Identification number:	10.1007/b95558
Uncontrolled keywords:	Regular Expression, Machine Learning Approach, Punctuation Mark, Maximum Entropy Model, English Document
Subjects:	Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Mark Wheadon
Date Deposited:	29 Mar 2010 12:16 UTC
Last Modified:	28 Apr 2026 07:42 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/24119 (The current URI for this page, for reference purposes)

University of Kent Author Information

Silla Jr, Carlos N..

Creator's ORCID:
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.