Lemmatisation of the Corpus of Cornish

Mills, Jon Lemmatisation of the Corpus of Cornish. In: Proceedings of the Workshop on Language Resources for European Minority Languages, LREC First International Conference on Language Resources and Evaluation, Granada, Spain, May 27, 1998. (The full text of this publication is not available from this repository)

The full text of this publication is not available from this repository. (Contact us about this Publication)

Abstract

Today Cornish is spoken by many Cornish people as a second language as a result of the Cornish language revival that has taken place during this century. However there remains some controversy with regard to which spelling system should form the pedagogical basis of this revival. In order to bring about a consensus on how to spell revived Cornish, reference needs to me made to the traditional material from the historical corpus. The original spelling of the texts is not consistent. This leads to obvious difficulties when asking a computer to find a particular lexeme for analysis, since a search has to be made, not only for all the inflected and mutated forms that the item can take, but also the many possible spellings of those. Lemmatisation may be defined as the creation of the base form corresponding to a given word form. This is usually achieved by transforming the word form. Computer assisted lemmatisation of the corpus was achieved with the aid of a specially written program called VOLTA (Vertical Output Lemma Tagging Aid). Tokenising the text manually proved slow and bothersome. It was, therefore, decided to develop a way of semi-automating tokenisation that could be integrated into VOLTA. The completed lemmatised Corpus of Cornish will provide data that engenders the study of Cornish syntax, morphology, lexicology and pragmatics. Finally movement towards a consensus on a spelling system for Cornish in the twenty first century may be encouraged and facilitated by a better understanding of Cornish linguistic heritage

Item Type: Conference or workshop item (Paper)
Subjects: T Technology
P Language and Literature
P Language and Literature > P Philology. Linguistics
Divisions: Faculties > Humanities > School of European Culture and Languages
Depositing User: Jon Mills
Date Deposited: 29 Jun 2011 14:41
Last Modified: 29 Jun 2011 14:41
Resource URI: http://kar.kent.ac.uk/id/eprint/8360 (The current URI for this page, for reference purposes)
  • Depositors only (login required):