Skip to main content

End-to-end DNN-CNN Classification for Language Identification

Jin, Ma, Song, Yan, McLoughlin, Ian Vince (2017) End-to-end DNN-CNN Classification for Language Identification. In: Proceedings of The World Congress on Engineering 2017. 1. pp. 119-203. IAENG ISBN 978-988-14-0474-9.

PDF - Author's Accepted Manuscript
Download (291kB) Preview
[img]
Preview
Official URL
http://www.iaeng.org/publication/WCE2017/

Abstract

A defining problem in spoken language identification (LID) is how to design effective representations which allow features to be extracted that are specific to language information. Recent advances in deep neural networks for feature extraction have led to significant improvements in results, with deep end-to-end methods proving effective. In this paper, a novel network is proposed and explored that models an effective representation using first and second-order statistics of features extracted from a well-trained phoneme-related DNN bottleneck network followed by a stack of CNN convolutional layers. The high-order statistics extracted through second order pooling at the output of the CNN are robust to speaker and channel variability, and background noise. Evaluation with NIST LRE 2009 shows improved performance compared to current state-of-the-art systems, achieving over 33% and 20% relative equal error rate (EER) improvement for 3s and 10s utterances.

Item Type: Conference or workshop item (Proceeding)
Subjects: T Technology
Divisions: Faculties > Sciences > School of Computing > Data Science
Depositing User: Ian McLoughlin
Date Deposited: 21 Apr 2017 09:28 UTC
Last Modified: 09 Jul 2019 11:19 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/61426 (The current URI for this page, for reference purposes)
McLoughlin, Ian Vince: https://orcid.org/0000-0001-7111-2008
  • Depositors only (login required):

Downloads

Downloads per month over past year