Skip to main content

A new time-frequency attention tensor network for language identification

Miao, Xiaoxiao, McLoughlin, Ian Vince, Yan, Yonghong (2019) A new time-frequency attention tensor network for language identification. Circuits, Systems, and Signal Processing, . ISSN 0278-081X. (doi:10.1007/s00034-019-01286-9)

PDF - Author's Accepted Manuscript
Restricted to Repository staff only
Contact us about this Publication Download (1MB)
[img]
PDF - Publisher pdf

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 4.0 International License.
Download (1MB) Preview
[img]
Preview
Official URL
http://dx.doi.org/10.1007/s00034-019-01286-9

Abstract

In this paper, we aim to improve traditional DNN x-vector language identification (LID) performance by employing Wide Residual Networks (WRN) as a powerful feature extractor which we combine with a novel frequency attention network (F-ATN). Compared with conventional time attention, our method learns discriminative weights for different frequency bands to generate weighted means and standard deviations for utterance-level classification. This mechanism enables the architecture to direct attention to important frequency bands rather than important time frames, as in traditional time attention (T-ATN) methods. Furthermore, we then introduce a cross-layer frequency attention tensor network (CLF-ATN) which exploits information from different layers to recapture frame-level language characteristics that have been dropped by aggressive frequency pooling in lower layers. This effectively restores fine-grained discriminative language details. Finally, we explore the joint fusion of frame-level and frequency-band attention in a time-frequency attention network (TF-ATN). Experimental results show firstly that WRN can significantly outperform a traditional DNN x-vector implementation. Secondly, the proposed frequency attention method is more effective than time attention and thirdly that frequency-time score fusion can yield further improvement. Finally, extensive experiments on CLF-ATN demonstrate that it is able to improve discrimination by regaining dropped fine-grained frequency information, particularly for low dimension frequency features.

Item Type: Article
DOI/Identification number: 10.1007/s00034-019-01286-9
Uncontrolled keywords: Language Identification, DNN x-vector, time-frequency attention tensor network, cross-layer frequency tensor attention network
Subjects: T Technology
Divisions: Faculties > Sciences > School of Computing > Data Science
Depositing User: Ian McLoughlin
Date Deposited: 21 Oct 2019 08:04 UTC
Last Modified: 27 Nov 2019 14:16 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/77646 (The current URI for this page, for reference purposes)
McLoughlin, Ian Vince: https://orcid.org/0000-0001-7111-2008
  • Depositors only (login required):

Downloads

Downloads per month over past year