Skip to main content

Speech reconstruction using a deep partially supervised neural network

McLoughlin, Ian Vince, Li, Jingjie, Song, Yan, Sharifzadeh, Hamid Reza (2017) Speech reconstruction using a deep partially supervised neural network. IET Healthcare Technology Letters, 4 (4). pp. 129-133. ISSN 2053-3713. E-ISSN 2053-3713. (doi:10.1049/htl.2016.0103) (KAR id:61425)

PDF Author's Accepted Manuscript
Language: English


Download (695kB)
[thumbnail of SpeechRecostruction_by_parallel_trained_DNN.pdf]
This file may not be suitable for users of assistive technology.
Request an accessible format
Official URL:
http://dx.doi.org/10.1049/htl.2016.0103

Abstract

Statistical speech reconstruction for larynx-related dysphonia has achieved good performance using Gaussian mixture models and, more recently, restricted Boltzmann machine arrays, however deep neural network-based systems have been hampered by the limited amount of training data available from individual voice-loss patients.

We propose a novel deep neural network structure that allows a partially supervised training approach on spectral features from smaller datasets, yielding very good results compared to the current state-of-the-art.

Item Type: Article
DOI/Identification number: 10.1049/htl.2016.0103
Uncontrolled keywords: Speech reconstruction, post-laryngectomy speech, statistical voice conversion
Subjects: T Technology > T Technology (General)
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Ian McLoughlin
Date Deposited: 21 Apr 2017 09:18 UTC
Last Modified: 10 Dec 2022 05:22 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/61425 (The current URI for this page, for reference purposes)
McLoughlin, Ian Vince: https://orcid.org/0000-0001-7111-2008
  • Depositors only (login required):

Downloads

Downloads per month over past year