Whisper-to-speech conversion using restricted Boltzmann machine arrays

Li, Jing-jie, McLoughlin, Ian Vince, Dai, Li-Rong, Ling, Zhen-hua (2014) Whisper-to-speech conversion using restricted Boltzmann machine arrays. Electronics Letters, 50 (24). pp. 1781-1782. ISSN 0013-5194. E-ISSN 1350-911X. (doi:10.1049/el.2014.1645) (KAR id:48793)

PDF Language: English
Download this file (PDF/3MB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: http://dx.doi.org/10.1049/el.2014.1645
Additional URLs: http://digital-library.theiet.org/conten...

Abstract

Whispers are a natural vocal communication mechanism, in which vocal cords do not vibrate normally. Lack of glottal-induced pitch leads to low energy, and an inherent noise-like spectral distribution reduces intelligibility. Much research has been devoted to processing of whispers, including conversion of whispers to speech. Unfortunately, among several approaches, the best reconstructed speech to date still contains obviously artificial muffles and suffers from an unnatural prosody. To address these issues, the novel use of multiple restricted Boltzmann machines (RBMs) is reported as a statistical conversion model between whisper and speech spectral envelopes. Moreover, the accuracy of estimated pitch is improved using machine learning techniques for pitch estimation within only voiced (V) regions. Both objective and subjective evaluations show that this new method improves the quality of whisper-reconstructed speech compared with the state-of-the-art approaches.

Item Type:	Article
DOI/Identification number:	10.1049/el.2014.1645
Uncontrolled keywords:	Gaussian mixture model, RBM arrays, artificial muffle, glottal-induced pitch lead, human-to-human vocal communication mechanism, inherent noise-like spectral distribution, machine learning technique, pitch accuracy, pitch estimation, restricted Boltzmann machine array, speech intelligibility, speech reconstruction, speech spectral envelope, statistical conversion model, unnatural prosody, vocal cord, voiced region, whisper processing, whisper-to-speech conversion
Subjects:	T Technology
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Ian McLoughlin
Date Deposited:	25 Aug 2015 08:36 UTC
Last Modified:	28 Apr 2026 08:16 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/48793 (The current URI for this page, for reference purposes)

University of Kent Author Information

McLoughlin, Ian Vince.

Creator's ORCID:	https://orcid.org/0000-0001-7111-2008
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.