Deep Neural Network for Robust Speech Recognition With Auxiliary Features From Laser-Doppler Vibrometer Sensor

Xie, Zhi-Peng and Du, Jun and McLoughlin, Ian and Xu, Yong and Ma, Feng and Wang, Haikun (2016) Deep Neural Network for Robust Speech Recognition With Auxiliary Features From Laser-Doppler Vibrometer Sensor. In: 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE. ISBN 978-1-5090-4295-1. E-ISBN 978-1-5090-4294-4. (doi:10.1109/ISCSLP.2016.7918400) (KAR id:57111)

PDF Author's Accepted Manuscript Language: English
Download this file (PDF/286kB)
Request a format suitable for use with assistive technology e.g. a screenreader
PDF Author's Accepted Manuscript Language: English This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Download this file (PDF/415kB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: http://dx.doi.org/10.1109/ISCSLP.2016.7918400
Additional URLs: http://www.iscslp2016.org

Abstract

Recently, the signal captured from a laser Doppler vibrometer (LDV) sensor been used to improve the noise robustness automatic speech recognition (ASR) systems by enhancing the acoustic signal prior to feature extraction. This study proposes another approach in which auxiliary features extracted from the LDV signal are used alongside conventional acoustic features to further improve ASR performance based on the use of a deep neural network (DNN) as the acoustic model. While this approach is promising, the best training data sets for ASR do not include LDV data in parallel with the acoustic signal. Thus, to leverage such existing large-scale speech databases, a regres- sion DNN is designed to map acoustic features to LDV features. This regression DNN is well trained from a limited size parallel signal data set, then used to form pseudo-LDV features from a massive speech data set for parallel training of an ASR system. Our experiments show that both the features from the limited scale LDV data set as well as the massive scale pseudo-LDV features are able to train an ASR system that significantly outperforms one using acoustic features alone, in both quiet and noisy environments.

Item Type:	Book section
DOI/Identification number:	10.1109/ISCSLP.2016.7918400
Uncontrolled keywords:	laser Doppler vibrometer, auxiliary features, deep neural network, regression model, speech recognition
Subjects:	T Technology
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Ian McLoughlin
Date Deposited:	06 Sep 2016 08:39 UTC
Last Modified:	28 Apr 2026 08:31 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/57111 (The current URI for this page, for reference purposes)

University of Kent Author Information

McLoughlin, Ian.

Creator's ORCID:	https://orcid.org/0000-0001-7111-2008
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.