Phonated Speech Reconstruction Using Twin Mapping Models

Sharifzadeh, Hamid R. and HajiRassouliha, Amir and McLoughlin, Ian V. and Ardekani, Imam and Allen, Jacqueline E. (2016) Phonated Speech Reconstruction Using Twin Mapping Models. In: 2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE. ISBN 978-1-5090-0480-5. E-ISBN 978-1-5090-0481-2. (doi:10.1109/ISSPIT.2015.7394247) (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided) (KAR id:51758)

PDF Author's Accepted Manuscript Language: English Restricted to Repository staff only

Official URL: http://dx.doi.org/10.1109/ISSPIT.2015.7394247

Abstract

Computational speech reconstruction algorithms have the ultimate aim of returning natural sounding speech to aphonic and dysphonic individuals. These algorithms can also be used by unimpaired speakers for communicating sensitive or private information. When the glottis loses function due to disease or surgery, aphonic and dysphonic patients retain the power of vocal tract modulation to some degree but they are unable to speak anything more than hoarse whispers without prosthetic aid. While whispering can be seen as a natural and secondary aspect of speech communications for most people, it becomes the primary mechanism of communications for those who have impaired voice production mechanisms, such as laryngectomees. In this paper, by considering the current limitations of speech reconstruction methods, a novel algorithm for converting whispers to normal speech is proposed and the efficiency of the algorithm is discussed. The proposed algorithm relies upon twin mapping models and makes use of artificially generated whispers (called whisperised speech) to regenerate natural phonated speech from whispers. Through a training-based approach, the mapping models exploit whisperised speech to overcome frame to frame time alignment problem in the speech reconstruction process.

Item Type:	Book section
DOI/Identification number:	10.1109/ISSPIT.2015.7394247
Uncontrolled keywords:	Speech reconstruction, laryngectomy, laryngectomee, post-laryngectomised, phonation, whispers, statistical voice conversion
Subjects:	T Technology
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Ian McLoughlin
Date Deposited:	12 Nov 2015 11:54 UTC
Last Modified:	20 May 2025 10:17 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/51758 (The current URI for this page, for reference purposes)

University of Kent Author Information

McLoughlin, Ian V..

Creator's ORCID:	https://orcid.org/0000-0001-7111-2008
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.