Skip to main content
Kent Academic Repository

A Logical Approach to the Lemmatisation of Computational Lexica

Mills, Jon (1999) A Logical Approach to the Lemmatisation of Computational Lexica. In: VI Simposio Internacional de Comunicación Social, 25-28 Jan 1999, Santiago de Cuba, Cuba. (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:8339)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided.

Abstract

Lemmatisation is a crucial part of the compilation of a computational lexicon; it is the process which determines the selection and presentation of lemmata. Lemmatisation is a non-trivial task; it consists of a good deal more than deinflection to identify a baseform that can serve as a headword. The lemma has three functions: to uniquely identify the lexical unit, to locate it in the system, and to describe its form. The computational lexicographer is confronted by a number of problems. Homographs need to be distinguished. Several variants of the baseform may exist from which a preferred form will have to be chosen. Compounds may be written as solid, hyphenated or as two words. A way has to be found to treat multi-word lexemes. It may be necessary to give some very common affixes main-entry status. A decision has to be made whether to treat derivatives as main entries with cross reference to the baseform or regroup them under the baseform. A solution is suggested in which the preferred baseform together with a number of other distinguishers may be satisfactorily employed to fulfil all the functions of the lemma. Next it is shown how these elements may be placed within a logical framework to implement computational lexica. The model is then extended to deal with problems of asymmetry in interlingual lemmatisation.

Item Type: Conference or workshop item (Paper)
Subjects: T Technology
P Language and Literature
P Language and Literature > P Philology. Linguistics
Divisions: Divisions > Division of Arts and Humanities > School of Culture and Languages
Depositing User: Francis Mills
Date Deposited: 24 May 2016 11:49 UTC
Last Modified: 16 Nov 2021 09:46 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/8339 (The current URI for this page, for reference purposes)

University of Kent Author Information

Mills, Jon.

Creator's ORCID:
CReDIT Contributor Roles:
  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.