Xie, Zhipeng, McLoughlin, Ian, Zhang, Haomin, Song, Yan, Xiao, Wei (2016) A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features. Digital Signal Processing, 54 . pp. 119-128. ISSN 1051-2004. (doi:10.1016/j.dsp.2016.04.005) (KAR id:55016)
PDF (A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features)
Author's Accepted Manuscript
Language: English
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
|
|
Download this file (PDF/747kB) |
Preview |
Request a format suitable for use with assistive technology e.g. a screenreader | |
PDF (A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features)
Author's Accepted Manuscript
Language: English Restricted to Repository staff only |
|
|
|
Official URL: http://dx.doi.org/10.1016/j.dsp.2016.04.005 |
Abstract
Machine hearing is an emerging research field that is analogous to machine vision in that it aims to equip computers with the ability to hear and recognise a variety of sounds. It is a key enabler of natural human–computer speech interfacing, as well as in areas such as automated security surveillance, environmental monitoring, smart homes/buildings/cities. Recent advances in machine learning allow current systems to accurately recognise a diverse range of sounds under controlled conditions. However doing so in real-world noisy conditions remains a challenging task. Several front–end feature extraction methods have been used for machine hearing, employing speech recognition features like MFCC and PLP, as well as image-like features such as AIM and SIF. The best choice of feature is found to be dependent upon the noise environment and machine learning techniques used. Machine learning methods such as deep neural networks have been shown capable of inferring discriminative classification rules from less structured front–end features in related domains. In the machine hearing field, spectrogram image features have recently shown good performance for noise-corrupted classification using deep neural networks. However there are many methods of extracting features from spectrograms. This paper explores a novel data-driven feature extraction method that uses variance-based criteria to define spectral pooling of features from spectrograms. The proposed method, based on maximising the pooled spectral variance of foreground and background sound models, is shown to achieve very good performance for robust classification.
Item Type: | Article |
---|---|
DOI/Identification number: | 10.1016/j.dsp.2016.04.005 |
Uncontrolled keywords: | Machine hearing; auditory event detection; robust auditory classification |
Subjects: | T Technology > T Technology (General) |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Depositing User: | Ian McLoughlin |
Date Deposited: | 19 Apr 2016 08:34 UTC |
Last Modified: | 05 Nov 2024 10:43 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/55016 (The current URI for this page, for reference purposes) |
- Link to SensusAccess
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):