Li, Zheng-Xi, Song, Yan, Dai, Li-Rong, McLoughlin, Ian (2019) Listening and grouping: an online autoregressive approach for monaural speech separation. IEEE Transactions On Audio Speech And Language Processing, 27 (4). pp. 692-703. ISSN 1558-7916. E-ISSN 2329-9304. (doi:10.1109/TASLP.2019.2892241) (KAR id:71467)
|
PDF
Author's Accepted Manuscript
Language: English |
|
|
Download this file (PDF/1MB) |
|
| Request a format suitable for use with assistive technology e.g. a screenreader | |
| Official URL: http://dx.doi.org/10.1109/TASLP.2019.2892241 |
|
Abstract
This paper proposes an autoregressive approach to harness the power of deep learning for multi-speaker monaural speech separation. It exploits a causal temporal context in both mixture and past estimated separated signals and performs online separation that is compatible with real-time applications. The approach adopts a learned listening and grouping architecture motivated by computational auditory scene analysis, with a grouping stage that effectively addresses the label permutation problem at both frame and segment levels. Experimental results on the benchmark WSJ0-2mix dataset show that the new approach can outperform the majority of state-of-the-art methods in both closed-set and open-set conditions in terms of signal-to-distortion ratio (SDR) improvement and perceptual evaluation of speech quality (PESQ), even approaches that exploit whole-utterance statistics for separation, with relatively fewer model parameters.
| Item Type: | Article |
|---|---|
| DOI/Identification number: | 10.1109/TASLP.2019.2892241 |
| Uncontrolled keywords: | Speech separation, deep learning, label permutation problem, computational auditory scene analysis |
| Subjects: | T Technology |
| Institutional Unit: | Schools > School of Computing |
| Former Institutional Unit: |
Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
|
| Depositing User: | Ian McLoughlin |
| Date Deposited: | 31 Dec 2018 03:36 UTC |
| Last Modified: | 20 May 2025 10:23 UTC |
| Resource URI: | https://kar.kent.ac.uk/id/eprint/71467 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):

https://orcid.org/0000-0001-7111-2008
Altmetric
Altmetric