Jin, Ma, Song, Yan, McLoughlin, Ian Vince (2017) End-to-end DNN-CNN Classification for Language Identification. In: Proceedings of The World Congress on Engineering 2017. 1. pp. 119-203. IAENG ISBN 978-988-14-0474-9. (KAR id:61426)
PDF
Author's Accepted Manuscript
Language: English |
|
Download this file (PDF/360kB) |
Preview |
Request a format suitable for use with assistive technology e.g. a screenreader | |
Official URL: http://www.iaeng.org/publication/WCE2017/ |
Abstract
A defining problem in spoken language identification (LID) is how to design effective representations which allow features to be extracted that are specific to language information.
Recent advances in deep neural networks for feature extraction have led to significant improvements in results, with deep end-to-end methods proving effective.
In this paper, a novel network is proposed and explored that models an effective representation using first and second-order statistics of features extracted from a well-trained phoneme-related DNN bottleneck network followed by a stack of CNN convolutional layers.
The high-order statistics extracted through second order pooling at the output of the CNN are robust to speaker and channel variability, and background noise.
Evaluation with NIST LRE 2009 shows improved performance compared to current state-of-the-art systems, achieving over 33% and 20% relative equal error rate (EER) improvement for 3s and 10s utterances.
Item Type: | Conference or workshop item (Proceeding) |
---|---|
Subjects: | T Technology |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Depositing User: | Ian McLoughlin |
Date Deposited: | 21 Apr 2017 09:28 UTC |
Last Modified: | 16 Feb 2021 13:44 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/61426 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):