Skip to main content

Whisper-to-speech conversion using restricted Boltzmann machine arrays

Li, Jing-jie, McLoughlin, Ian Vince, Dai, Li-Rong, Ling, Zhen-hua (2014) Whisper-to-speech conversion using restricted Boltzmann machine arrays. Electronics Letters, 50 (24). pp. 1781-1782. ISSN 0013-5194. E-ISSN 1350-911X. (doi:10.1049/el.2014.1645) (KAR id:48793)

Language: English
Download (344kB) Preview
[thumbnail of ELL-2014-1645.pdf]
This file may not be suitable for users of assistive technology.
Request an accessible format
Official URL


Whispers are a natural vocal communication mechanism, in which vocal cords do not vibrate normally. Lack of glottal-induced pitch leads to low energy, and an inherent noise-like spectral distribution reduces intelligibility. Much research has been devoted to processing of whispers, including conversion of whispers to speech. Unfortunately, among several approaches, the best reconstructed speech to date still contains obviously artificial muffles and suffers from an unnatural prosody. To address these issues, the novel use of multiple restricted Boltzmann machines (RBMs) is reported as a statistical conversion model between whisper and speech spectral envelopes. Moreover, the accuracy of estimated pitch is improved using machine learning techniques for pitch estimation within only voiced (V) regions. Both objective and subjective evaluations show that this new method improves the quality of whisper-reconstructed speech compared with the state-of-the-art approaches.

Item Type: Article
DOI/Identification number: 10.1049/el.2014.1645
Uncontrolled keywords: Gaussian mixture model, RBM arrays, artificial muffle, glottal-induced pitch lead, human-to-human vocal communication mechanism, inherent noise-like spectral distribution, machine learning technique, pitch accuracy, pitch estimation, restricted Boltzmann machine array, speech intelligibility, speech reconstruction, speech spectral envelope, statistical conversion model, unnatural prosody, vocal cord, voiced region, whisper processing, whisper-to-speech conversion
Subjects: T Technology
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Ian McLoughlin
Date Deposited: 25 Aug 2015 08:36 UTC
Last Modified: 16 Feb 2021 13:25 UTC
Resource URI: (The current URI for this page, for reference purposes)
McLoughlin, Ian Vince:
  • Depositors only (login required):


Downloads per month over past year