Li, Jingjie and McLoughlin, Ian Vince and Song, Yan (2014) Reconstruction of pitch for whisper-to-speech conversion of Chinese. In: The 9th International Symposium on Chinese Spoken Language Processing. IEEE, pp. 206-210. E-ISBN 978-1-4799-4219-0. (doi:10.1109/ISCSLP.2014.6936709) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:48809)
The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided. | |
Official URL: http://dx.doi.org/10.1109/ISCSLP.2014.6936709 |
Abstract
Whispers are a common and necessary secondary vocal communications mechanism for natural human-to-human dialogue. They are also the primary communications mechanism for many suffering from aphonia, such as laryngectomees. For typical speakers, whispering is a predominantly contextual activity, prompted by either the sensitive nature of information being conveyed or in response to environmental considerations. Given the importance of whispers, especially for tonal languages like Chinese, and the fact that many communications systems assume vocalised speech, much work has been directed towards the conversion of whispers into natural sounding speech. Since pitch information is largely absent in whispers, it is this key f0 information which needs to be supplied during the regeneration process, and which is the focus of much research. GMM-based reconstruction techniques have proven effective at whisper reconstruction, and some recent work has proposed the use of artificial pitch derived from formant harmonics as an alternative. This paper describes a new formulation of the formant-harmonic f0 method, and compares this directly against a novel GMM-based f0 estimator, as well as known correct pitch excitation for parallel utterances.
Item Type: | Book section |
---|---|
DOI/Identification number: | 10.1109/ISCSLP.2014.6936709 |
Uncontrolled keywords: | Cepstral analysis, Chinese, Feature extraction, GMM, GMM-based reconstruction techniques, Joints, Modulation, Speech, Speech processing, Vectors, Whisper speech, aphonia, artificial pitch, formant-harmonic f0 method, laryngectomees, natural human-to-human dialogue, natural language processing, natural sounding speech, parallel utterances, pitch reconstruction, primary communications mechanism, regeneration process, secondary vocal communications mechanism, speech processing, speech reconstruction, tonal languages, vocalised speech, whisper conversion, whisper reconstruction, whisper-to-speech conversion |
Subjects: | T Technology |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Depositing User: | Ian McLoughlin |
Date Deposited: | 25 Aug 2015 08:44 UTC |
Last Modified: | 17 Aug 2022 10:58 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/48809 (The current URI for this page, for reference purposes) |
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):