Li, Jing-jie, McLoughlin, Ian Vince, Dai, Li-Rong, Ling, Zhen-hua (2014) Whisper-to-speech conversion using restricted Boltzmann machine arrays. Electronics Letters, 50 (24). pp. 1781-1782. ISSN 0013-5194. E-ISSN 1350-911X. (doi:10.1049/el.2014.1645) (KAR id:48793)
PDF
Language: English |
|
Download this file (PDF/3MB) |
Preview |
Request a format suitable for use with assistive technology e.g. a screenreader | |
Official URL: http://dx.doi.org/10.1049/el.2014.1645 |
Abstract
Whispers are a natural vocal communication mechanism, in which vocal cords do not vibrate normally. Lack of glottal-induced pitch leads to low energy, and an inherent noise-like spectral distribution reduces intelligibility. Much research has been devoted to processing of whispers, including conversion of whispers to speech. Unfortunately, among several approaches, the best reconstructed speech to date still contains obviously artificial muffles and suffers from an unnatural prosody. To address these issues, the novel use of multiple restricted Boltzmann machines (RBMs) is reported as a statistical conversion model between whisper and speech spectral envelopes. Moreover, the accuracy of estimated pitch is improved using machine learning techniques for pitch estimation within only voiced (V) regions. Both objective and subjective evaluations show that this new method improves the quality of whisper-reconstructed speech compared with the state-of-the-art approaches.
Item Type: | Article |
---|---|
DOI/Identification number: | 10.1049/el.2014.1645 |
Uncontrolled keywords: | Gaussian mixture model, RBM arrays, artificial muffle, glottal-induced pitch lead, human-to-human vocal communication mechanism, inherent noise-like spectral distribution, machine learning technique, pitch accuracy, pitch estimation, restricted Boltzmann machine array, speech intelligibility, speech reconstruction, speech spectral envelope, statistical conversion model, unnatural prosody, vocal cord, voiced region, whisper processing, whisper-to-speech conversion |
Subjects: | T Technology |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Depositing User: | Ian McLoughlin |
Date Deposited: | 25 Aug 2015 08:36 UTC |
Last Modified: | 05 Nov 2024 10:32 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/48793 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):