Performance evaluation of deep bottleneck features for spoken language identification

Jiang, Bing and Song, Yan and Wei, Si and Wang, Meng-Ge and McLoughlin, Ian and Dai, Li-Rong (2014) Performance evaluation of deep bottleneck features for spoken language identification. In: The 9th International Symposium on Chinese Spoken Language Processing. IEEE, pp. 143-147. E-ISBN 978-1-4799-4219-0. (doi:10.1109/ISCSLP.2014.6936580) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:48804)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided.
Official URL: http://dx.doi.org/10.1109/ISCSLP.2014.6936580
Additional URLs: http://ieeexplore.ieee.org/lpdocs/epic03...

Abstract

Our previous work has shown that Deep Bottleneck Features (DBF), generated from a well-trained Deep Neural Network (DNN), can provide high performance Language Identification (LID) when Total Variability (TV) modelling is used for a back-end. This may largely be attributed to the powerful capability of the DNN for finding a frame-level representation which is robust to variances caused by different speakers, channels and background noise. However the DBF in the previous work were extracted from a DNN that was trained using a large ASR dataset. Optimal LID DBF parameters may differ from those that are known to be optimal for ASR. Thus this paper focuses on investigating different DBF extractors, input layer window sizes and dimensionality, and bottleneck layer location. Additionally, principal component analysis (PCA) is used to decorrelate the DBF. Experiments, based on the Gaussian Mixture Model-Universal Background Model (GMM-UBM) operating on the NIST LRE 2009 database, are conducted to evaluate the system. Results allow comparison between different DBF extractor parameters, as well as demonstrating that LID based on DBF can significantly outperform the conventional shift delta cepstral (SDC) features.

Item Type:	Book section
DOI/Identification number:	10.1109/ISCSLP.2014.6936580
Additional information:	Unmapped bibliographic data: Y1 - 2014/09// [EPrints field already has value set]
Uncontrolled keywords:	Acoustics, Context, DNN training, Feature extraction, GMM-UBM, Gaussian mixture model-universal background model, Gaussian processes, NIST LRE 2009 database, Neural networks, PCA, Principal component analysis, Speech, TV modelling, Training, bottleneck layer location, decorrelation, deep bottleneck feature, deep neural network, deep-bottleneck features, deep-neural network training, frame-level representation, gaussian mixture model, high-performance language identification, input layer window dimensionality, input layer window sizes, language identification, learning (artificial intelligence), mixture models, natural language processing, neural nets, optimal LID DBF extractor parameters, performance evaluation, principal component analysis, shift delta cepstral, signal representation, spoken language identification, total variability modelling
Subjects:	T Technology
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Ian McLoughlin
Date Deposited:	25 Aug 2015 08:41 UTC
Last Modified:	20 May 2025 10:16 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/48804 (The current URI for this page, for reference purposes)

University of Kent Author Information

McLoughlin, Ian.

Creator's ORCID:	https://orcid.org/0000-0001-7111-2008
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.