Skip to main content

Performance evaluation of deep bottleneck features for spoken language identification

Jiang, Bing, Song, Yan, Wei, Si, Wang, Meng-Ge, McLoughlin, Ian Vince, Dai, Li-Rong (2014) Performance evaluation of deep bottleneck features for spoken language identification. In: Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on. Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on. . pp. 143-147. IEEE ISBN 978-1-4799-4219-0. (doi:10.1109/ISCSLP.2014.6936580) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided. (Contact us about this Publication)
Official URL
http://dx.doi.org/10.1109/ISCSLP.2014.6936580

Abstract

Our previous work has shown that Deep Bottleneck Features (DBF), generated from a well-trained Deep Neural Network (DNN), can provide high performance Language Identification (LID) when Total Variability (TV) modelling is used for a back-end. This may largely be attributed to the powerful capability of the DNN for finding a frame-level representation which is robust to variances caused by different speakers, channels and background noise. However the DBF in the previous work were extracted from a DNN that was trained using a large ASR dataset. Optimal LID DBF parameters may differ from those that are known to be optimal for ASR. Thus this paper focuses on investigating different DBF extractors, input layer window sizes and dimensionality, and bottleneck layer location. Additionally, principal component analysis (PCA) is used to decorrelate the DBF. Experiments, based on the Gaussian Mixture Model-Universal Background Model (GMM-UBM) operating on the NIST LRE 2009 database, are conducted to evaluate the system. Results allow comparison between different DBF extractor parameters, as well as demonstrating that LID based on DBF can significantly outperform the conventional shift delta cepstral (SDC) features.

Item Type: Conference or workshop item (Paper)
DOI/Identification number: 10.1109/ISCSLP.2014.6936580
Additional information: Unmapped bibliographic data: Y1 - 2014/09// [EPrints field already has value set]
Uncontrolled keywords: Acoustics, Context, DNN training, Feature extraction, GMM-UBM, Gaussian mixture model-universal background model, Gaussian processes, NIST LRE 2009 database, Neural networks, PCA, Principal component analysis, Speech, TV modelling, Training, bottleneck layer location, decorrelation, deep bottleneck feature, deep neural network, deep-bottleneck features, deep-neural network training, frame-level representation, gaussian mixture model, high-performance language identification, input layer window dimensionality, input layer window sizes, language identification, learning (artificial intelligence), mixture models, natural language processing, neural nets, optimal LID DBF extractor parameters, performance evaluation, principal component analysis, shift delta cepstral, signal representation, spoken language identification, total variability modelling
Subjects: T Technology
Divisions: Faculties > Sciences > School of Computing
Depositing User: Ian McLoughlin
Date Deposited: 25 Aug 2015 08:41 UTC
Last Modified: 29 May 2019 14:39 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/48804 (The current URI for this page, for reference purposes)
McLoughlin, Ian Vince: https://orcid.org/0000-0001-7111-2008
  • Depositors only (login required):