Tang, Jian, Song, Yan, Dai, Li-Rong, McLoughlin, Ian Vince (2018) Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition. In: ISCA Conference. . (doi:10.21437/Interspeech.2018-1089) (KAR id:67452)
PDF
Author's Accepted Manuscript
Language: English |
|
Download this file (PDF/895kB) |
Preview |
Request a format suitable for use with assistive technology e.g. a screenreader | |
Official URL: http://dx.doi.org/10.21437/Interspeech.2018-1089 |
Abstract
Motivated by recent advances in computer vision research, this paper proposes a novel acoustic model called Densely Connected Residual Network (DenseRNet) for multichannel speech recognition. This combines the strength of both DenseNet and ResNet. It adopts the basic "building blocks" of ResNet with different convolutional layers, receptive field sizes and growth rates as basic components that are densely connected to form so-called denseR blocks. By concatenating the feature maps of all preceding layers as inputs, DenseRNet can not only strengthen gradient back-propagation for the vanishing-gradient problem, but also exploit multi-resolution feature maps. Preliminary experimental results on CHiME-3 have shown that DenseRNet achieves a word error rate (WER) of 7.58% on beamforming-enhanced speech with six channel real test data by cross entropy criteria training while WER is 10.23% for the official baseline. Besides, additional experimental results are also presented to demonstrate that DenseRNet exhibits the robustness to beamforming-enhanced speech as well as near and far-field speech.
Item Type: | Conference or workshop item (Paper) |
---|---|
DOI/Identification number: | 10.21437/Interspeech.2018-1089 |
Uncontrolled keywords: | DenseNet, robust acoustic model, ResNet, speech recognition, CHiME-3 |
Subjects: | T Technology |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Depositing User: | Ian McLoughlin |
Date Deposited: | 29 Jun 2018 09:23 UTC |
Last Modified: | 05 Nov 2024 11:07 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/67452 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):