Skip to main content

Improved i-vector representation for speaker diarization

Xu, Yan, McLoughlin, Ian Vince, Song, Yan, Wu, Kui (2015) Improved i-vector representation for speaker diarization. Circuits, Systems, and Signal Processing, . pp. 1-12. ISSN 0278-081X. E-ISSN 1531-5878. (doi:10.1007/s00034-015-0206-2) (KAR id:55023)

PDF (he final publication is available at Springer via Author's Accepted Manuscript
Language: English

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
Download (240kB) Preview
Official URL


This paper proposes using a previously well-trained deep neural network (DNN) to enhance the i-vector representation used for speaker diarization. In effect, we replace the Gaussian Mixture Model (GMM) typically used to train a Universal Background Model (UBM), with a DNN that has been trained using a different large scale dataset. To train the T-matrix we use a supervised UBM obtained from the DNN using filterbank input features to calculate the posterior information, and then MFCC features to train the UBM instead of a traditional unsupervised UBM derived from single features. Next we jointly use DNN and MFCC features to calculate the zeroth and first order Baum-Welch statistics for training an extractor from which we obtain the i-vector. The system will be shown to achieve a significant improvement on the NIST 2008 speaker recognition evaluation (SRE) telephone data task compared to state-of-the-art approaches.

Item Type: Article
DOI/Identification number: 10.1007/s00034-015-0206-2
Uncontrolled keywords: Speaker diarization; DNN; i-vector;
Subjects: T Technology
Divisions: Faculties > Sciences > School of Computing
Depositing User: Ian McLoughlin
Date Deposited: 19 Apr 2016 10:13 UTC
Last Modified: 29 May 2019 17:14 UTC
Resource URI: (The current URI for this page, for reference purposes)
McLoughlin, Ian Vince:
  • Depositors only (login required):


Downloads per month over past year