McLoughlin, Ian Vince, Li, Jingjie, Song, Yan, Sharifzadeh, Hamid Reza (2017) Speech reconstruction using a deep partially supervised neural network. IET Healthcare Technology Letters, 4 (4). pp. 129-133. ISSN 2053-3713. E-ISSN 2053-3713. (doi:10.1049/htl.2016.0103) (KAR id:61425)
|
PDF
Author's Accepted Manuscript
Language: English
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
|
|
|
Download this file (PDF/695kB) |
|
| Request a format suitable for use with assistive technology e.g. a screenreader | |
| Official URL: http://dx.doi.org/10.1049/htl.2016.0103 |
|
Abstract
Statistical speech reconstruction for larynx-related dysphonia has achieved good performance using Gaussian mixture models and, more recently, restricted Boltzmann machine arrays, however deep neural network-based systems have been hampered by the limited amount of training data available from individual voice-loss patients.
We propose a novel deep neural network structure that allows a partially supervised training approach on spectral features from smaller datasets, yielding very good results compared to the current state-of-the-art.
| Item Type: | Article |
|---|---|
| DOI/Identification number: | 10.1049/htl.2016.0103 |
| Uncontrolled keywords: | Speech reconstruction, post-laryngectomy speech, statistical voice conversion |
| Subjects: | T Technology > T Technology (General) |
| Institutional Unit: | Schools > School of Computing |
| Former Institutional Unit: |
Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
|
| Depositing User: | Ian McLoughlin |
| Date Deposited: | 21 Apr 2017 09:18 UTC |
| Last Modified: | 22 Jul 2025 08:58 UTC |
| Resource URI: | https://kar.kent.ac.uk/id/eprint/61425 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):

https://orcid.org/0000-0001-7111-2008
Altmetric
Altmetric