A training-based speech regeneration approach with cascading mapping models

Sharifzadeh, Hamid Reza, HajiRassouliha, Amir, McLoughlin, Ian Vince, Ardenkani, Iman, Allen, Jaqui, Sarrafzadeh, A. (2017) A training-based speech regeneration approach with cascading mapping models. Computers & Electrical Engineering, 62 . pp. 601-611. ISSN 0045-7906. (doi:10.1016/j.compeleceng.2017.06.007) (KAR id:63212)

PDF Author's Accepted Manuscript Language: English This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Download this file (PDF/1MB)
Request a format suitable for use with assistive technology e.g. a screenreader
HTML Publisher pdf Language: English Restricted to Repository staff only This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Contact us about this Publication
Official URL: https://doi.org/10.1016/j.compeleceng.2017.06.007

Abstract

Computational speech reconstruction algorithms have the ultimate aim of returning natural sounding speech to aphonic and dysphonic patients as well as those who can only whisper. In particular, individuals who have lost glottis function due to disease or surgery, retain the power of vocal tract modulation to some degree but they are unable to speak anything more than hoarse whispers without prosthetic aid. While whispering can be seen as a natural and secondary aspect of speech communications for most people, it becomes the primary mechanism of communications for those who have impaired voice production mechanisms, such as laryngectomees.

In this paper, by considering the current limitations of speech reconstruction methods, a novel algorithm for converting whispers to normal speech is proposed and the efficiency of the algorithm is explored. The algorithm relies upon cascading mapping models and makes use of artificially generated whispers (called whisperised speech) to regenerate natural phonated speech from whispers. Using a training-based approach, the mapping models exploit whisperised speech to overcome frame to frame time alignment problems that are inherent in the speech reconstruction process. This algorithm effectively regenerates missing information in the conventional frameworks of phonated speech reconstruction, and is able to outperform the current state-of-the-art regeneration methods using both subjective and objective criteria.

Item Type:	Article
DOI/Identification number:	10.1016/j.compeleceng.2017.06.007
Uncontrolled keywords:	Speech reconstruction; Whispers; Electrolarynx; Laryngectomy; Time alignment
Subjects:	T Technology
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Data Science Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Ian McLoughlin
Date Deposited:	04 Sep 2017 13:39 UTC
Last Modified:	20 May 2025 10:20 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/63212 (The current URI for this page, for reference purposes)

University of Kent Author Information

McLoughlin, Ian Vince.

Creator's ORCID:	https://orcid.org/0000-0001-7111-2008
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.