Reconstruction of pitch for whisper-to-speech conversion of Chinese

Li, Jingjie and McLoughlin, Ian Vince and Song, Yan (2014) Reconstruction of pitch for whisper-to-speech conversion of Chinese. In: The 9th International Symposium on Chinese Spoken Language Processing. IEEE, pp. 206-210. E-ISBN 978-1-4799-4219-0. (doi:10.1109/ISCSLP.2014.6936709) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:48809)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided.
Official URL: http://dx.doi.org/10.1109/ISCSLP.2014.6936709
Additional URLs: http://ieeexplore.ieee.org/lpdocs/epic03...

Abstract

Whispers are a common and necessary secondary vocal communications mechanism for natural human-to-human dialogue. They are also the primary communications mechanism for many suffering from aphonia, such as laryngectomees. For typical speakers, whispering is a predominantly contextual activity, prompted by either the sensitive nature of information being conveyed or in response to environmental considerations. Given the importance of whispers, especially for tonal languages like Chinese, and the fact that many communications systems assume vocalised speech, much work has been directed towards the conversion of whispers into natural sounding speech. Since pitch information is largely absent in whispers, it is this key f0 information which needs to be supplied during the regeneration process, and which is the focus of much research. GMM-based reconstruction techniques have proven effective at whisper reconstruction, and some recent work has proposed the use of artificial pitch derived from formant harmonics as an alternative. This paper describes a new formulation of the formant-harmonic f0 method, and compares this directly against a novel GMM-based f0 estimator, as well as known correct pitch excitation for parallel utterances.

Item Type:	Book section
DOI/Identification number:	10.1109/ISCSLP.2014.6936709
Uncontrolled keywords:	Cepstral analysis, Chinese, Feature extraction, GMM, GMM-based reconstruction techniques, Joints, Modulation, Speech, Speech processing, Vectors, Whisper speech, aphonia, artificial pitch, formant-harmonic f0 method, laryngectomees, natural human-to-human dialogue, natural language processing, natural sounding speech, parallel utterances, pitch reconstruction, primary communications mechanism, regeneration process, secondary vocal communications mechanism, speech processing, speech reconstruction, tonal languages, vocalised speech, whisper conversion, whisper reconstruction, whisper-to-speech conversion
Subjects:	T Technology
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Ian McLoughlin
Date Deposited:	25 Aug 2015 08:44 UTC
Last Modified:	28 Apr 2026 08:16 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/48809 (The current URI for this page, for reference purposes)

University of Kent Author Information

McLoughlin, Ian Vince.

Creator's ORCID:	https://orcid.org/0000-0001-7111-2008
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.