Qian, Mengjie and McLoughlin, Ian and Quo, Wu and Dai, Lirong (2017) Mismatched Training Data Enhancement for Automatic Recognition of Children’s Speech using DNN-HMM. In: 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE. ISBN 978-1-5090-4295-1. E-ISBN 978-1-5090-4294-4. (doi:10.1109/ISCSLP.2016.7918386) (KAR id:57110)
PDF
Author's Accepted Manuscript
Language: English
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
|
|
Download this file (PDF/282kB) |
Preview |
Request a format suitable for use with assistive technology e.g. a screenreader | |
Official URL: http://dx.doi.org/10.1109/ISCSLP.2016.7918386 |
Abstract
The increasing profusion of commercial automatic speech recognition technology applications has been driven by big-data techniques, using high quality labelled speech datasets. Children's speech has greater time and frequency domain variability than typical adult speech, lacks good large scale training data, and presents difficulties relating to capture quality. Each of these factors reduces the performance of systems that automatically recognise children's speech. In this paper, children's speech recognition is investigated using a hybrid acoustic modelling approach based on deep neural networks and Gaussian mixture models with hidden Markov model back ends. We explore the incorporation of mismatched training data to achieve a better acoustic model and improve performance in the face of limited training data, as well as training data augmentation using noise. We also explore two arrangements for vocal tract length normalisation and a gender-based data selection technique suitable for training a children's speech recogniser.
Item Type: | Book section |
---|---|
DOI/Identification number: | 10.1109/ISCSLP.2016.7918386 |
Uncontrolled keywords: | speech; training data; hidden Markov models; speech recognition; data models; acoustics |
Subjects: | T Technology |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Depositing User: | Ian McLoughlin |
Date Deposited: | 06 Sep 2016 08:29 UTC |
Last Modified: | 05 Nov 2024 10:47 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/57110 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):