Skip to main content
Kent Academic Repository

End-to-end DNN-CNN Classification for Language Identification

Jin, Ma, Song, Yan, McLoughlin, Ian Vince (2017) End-to-end DNN-CNN Classification for Language Identification. In: Proceedings of The World Congress on Engineering 2017. 1. pp. 119-203. IAENG ISBN 978-988-14-0474-9. (KAR id:61426)

Abstract

A defining problem in spoken language identification (LID) is how to design effective representations which allow features to be extracted that are specific to language information.

Recent advances in deep neural networks for feature extraction have led to significant improvements in results, with deep end-to-end methods proving effective.

In this paper, a novel network is proposed and explored that models an effective representation using first and second-order statistics of features extracted from a well-trained phoneme-related DNN bottleneck network followed by a stack of CNN convolutional layers.

The high-order statistics extracted through second order pooling at the output of the CNN are robust to speaker and channel variability, and background noise.

Evaluation with NIST LRE 2009 shows improved performance compared to current state-of-the-art systems, achieving over 33% and 20% relative equal error rate (EER) improvement for 3s and 10s utterances.

Item Type: Conference or workshop item (Proceeding)
Subjects: T Technology
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Ian McLoughlin
Date Deposited: 21 Apr 2017 09:28 UTC
Last Modified: 16 Feb 2021 13:44 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/61426 (The current URI for this page, for reference purposes)

University of Kent Author Information

McLoughlin, Ian Vince.

Creator's ORCID: https://orcid.org/0000-0001-7111-2008
CReDIT Contributor Roles:
  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.