LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification

Ma, Jin, Song, Yan, McLoughlin, Ian Vince, Dai, Li-Rong, Ye, Zhong-Fu (2016) LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification. In: Odyssey 2016: The Speaker and Language Recognition Workshop. . pp. 210-216. (doi:10.21437/odyssey.2016-30) (KAR id:55055)

PDF Author's Accepted Manuscript Language: English This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Download this file (PDF/624kB)
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: http://www.isca-speech.org/archive/odyssey_2016/pd...

Abstract

A key problem in spoken language identification (LID) is how to effectively model features from a given speech utterance. Recent techniques such as end-to-end schemes and deep neural networks (DNNs) utilising transfer learning such as bottleneck (BN) features, have demonstrated good overall performance, but have not addressed the extraction of LID-specific features.

We thus propose a novel end-to-end neural network which aims to obtain effective LID-senone representations, which we define as being analogous to senones in speech recognition. We show that LID-senones combine a compact representation of the original acoustic feature space with a powerful descriptive and discriminative capability. Furthermore, a novel incremental training method is proposed to extract the weak language information buried in the acoustic features of insufficient language resources. Results on the six most confused languages in NIST LRE 2009 show good performance compared to state-of-the-art BN-GMM/i-vector and BN-DNN/i-vector systems. The proposed end-to-end network, coupled with an incremental training method which mitigates against over-fitting, has potential not just for LID, but also for other resource constrained tasks.

Item Type:	Conference or workshop item (Paper)
DOI/Identification number:	10.21437/odyssey.2016-30
Uncontrolled keywords:	language identification; utterance representation; end-to-end neural network; LID-senone; incremental training method;
Subjects:	T Technology
Divisions:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Ian McLoughlin
Date Deposited:	19 Apr 2016 10:58 UTC
Last Modified:	05 Nov 2024 10:43 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/55055 (The current URI for this page, for reference purposes)

University of Kent Author Information

McLoughlin, Ian Vince.

Creator's ORCID:	https://orcid.org/0000-0001-7111-2008
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Download Statistics

Total unique views for this document in KAR since July 2020. For more details click on the image.