Mills, Jon Lemmatisation of the Corpus of Cornish. In: Workshop on Language Resources for European Minority Languages, LREC First International Conference on Language Resources and Evaluation, May 27, 1998, Granada, Spain. (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:8360)
The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided. (Contact us about this Publication) |
Abstract
Today Cornish is spoken by many Cornish people as a second language as a result of the Cornish language revival that has taken place during this century. However there remains some controversy with regard to which spelling system should form the pedagogical basis of this revival. In order to bring about a consensus on how to spell revived Cornish, reference needs to me made to the traditional material from the historical corpus. The original spelling of the texts is not consistent. This leads to obvious difficulties when asking a computer to find a particular lexeme for analysis, since a search has to be made, not only for all the inflected and mutated forms that the item can take, but also the many possible spellings of those. Lemmatisation may be defined as the creation of the base form corresponding to a given word form. This is usually achieved by transforming the word form. Computer assisted lemmatisation of the corpus was achieved with the aid of a specially written program called VOLTA (Vertical Output Lemma Tagging Aid). Tokenising the text manually proved slow and bothersome. It was, therefore, decided to develop a way of semi-automating tokenisation that could be integrated into VOLTA. The completed lemmatised Corpus of Cornish will provide data that engenders the study of Cornish syntax, morphology, lexicology and pragmatics. Finally movement towards a consensus on a spelling system for Cornish in the twenty first century may be encouraged and facilitated by a better understanding of Cornish linguistic heritage
Item Type: | Conference or workshop item (Paper) |
---|---|
Subjects: |
T Technology P Language and Literature P Language and Literature > P Philology. Linguistics |
Divisions: | Divisions > Division of Arts and Humanities > School of Culture and Languages |
Depositing User: | Francis Mills |
Date Deposited: | 29 Jun 2011 14:41 UTC |
Last Modified: | 28 May 2019 13:44 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/8360 (The current URI for this page, for reference purposes) |
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):