Ma, Jin, Song, Yan, McLoughlin, Ian Vince, Dai, Li-Rong, Ye, Zhong-Fu (2016) LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification. In: Odyssey 2016: The Speaker and Language Recognition Workshop. . pp. 210-216. (doi:10.21437/odyssey.2016-30) (KAR id:55055)
PDF
Author's Accepted Manuscript
Language: English
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
|
|
Download this file (PDF/624kB) |
|
Request a format suitable for use with assistive technology e.g. a screenreader | |
Official URL: http://www.isca-speech.org/archive/odyssey_2016/pd... |
Abstract
A key problem in spoken language identification (LID) is how to effectively model features from a given speech utterance. Recent techniques such as end-to-end schemes and deep neural networks (DNNs) utilising transfer learning such as bottleneck (BN) features, have demonstrated good overall performance, but have not addressed the extraction of LID-specific features.
We thus propose a novel end-to-end neural network which aims to obtain effective LID-senone representations, which we define as being analogous to senones in speech recognition. We show that LID-senones combine a compact representation of the original acoustic feature space with a powerful descriptive and discriminative capability. Furthermore, a novel incremental training method is proposed to extract the weak language information buried in the acoustic features of insufficient language resources. Results on the six most confused languages in NIST LRE 2009 show good performance compared to state-of-the-art BN-GMM/i-vector and BN-DNN/i-vector systems. The proposed end-to-end network, coupled with an incremental training method which mitigates against over-fitting, has potential not just for LID, but also for other resource constrained tasks.
Item Type: | Conference or workshop item (Paper) |
---|---|
DOI/Identification number: | 10.21437/odyssey.2016-30 |
Uncontrolled keywords: | language identification; utterance representation; end-to-end neural network; LID-senone; incremental training method; |
Subjects: | T Technology |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Depositing User: | Ian McLoughlin |
Date Deposited: | 19 Apr 2016 10:58 UTC |
Last Modified: | 05 Nov 2024 10:43 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/55055 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):