Skip to main content

Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition

Tang, Jian, Song, Yan, Dai, Li-Rong, McLoughlin, Ian Vince (2018) Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition. In: ISCA Conference. . (doi:10.21437/Interspeech.2018-1089) (KAR id:67452)

PDF Author's Accepted Manuscript
Language: English
Download (348kB) Preview
[thumbnail of template_v10.pdf]
Preview
This file may not be suitable for users of assistive technology.
Request an accessible format
Official URL
http://dx.doi.org/10.21437/Interspeech.2018-1089

Abstract

Motivated by recent advances in computer vision research, this paper proposes a novel acoustic model called Densely Connected Residual Network (DenseRNet) for multichannel speech recognition. This combines the strength of both DenseNet and ResNet. It adopts the basic "building blocks" of ResNet with different convolutional layers, receptive field sizes and growth rates as basic components that are densely connected to form so-called denseR blocks. By concatenating the feature maps of all preceding layers as inputs, DenseRNet can not only strengthen gradient back-propagation for the vanishing-gradient problem, but also exploit multi-resolution feature maps. Preliminary experimental results on CHiME-3 have shown that DenseRNet achieves a word error rate (WER) of 7.58% on beamforming-enhanced speech with six channel real test data by cross entropy criteria training while WER is 10.23% for the official baseline. Besides, additional experimental results are also presented to demonstrate that DenseRNet exhibits the robustness to beamforming-enhanced speech as well as near and far-field speech.

Item Type: Conference or workshop item (Paper)
DOI/Identification number: 10.21437/Interspeech.2018-1089
Uncontrolled keywords: DenseNet, robust acoustic model, ResNet, speech recognition, CHiME-3
Subjects: T Technology
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Ian McLoughlin
Date Deposited: 29 Jun 2018 09:23 UTC
Last Modified: 16 Feb 2021 13:55 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/67452 (The current URI for this page, for reference purposes)
McLoughlin, Ian Vince: https://orcid.org/0000-0001-7111-2008
  • Depositors only (login required):

Downloads

Downloads per month over past year