Skip to main content

A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features

Xie, Zhi-Peng, McLoughlin, Ian Vince, Zhang, Hao-min, Song, Yan, Xiao, Wei (2016) A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features. Digital Signal Processing, . ISSN 1051-2004. (doi:10.1016/j.dsp.2016.04.005)

PDF (A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features) - Author's Accepted Manuscript

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Download (780kB) Preview
[img]
Preview
PDF (A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features) - Author's Accepted Manuscript
Restricted to Repository staff only
Contact us about this Publication Download (731kB)
[img]
Official URL
http://dx.doi.org/10.1016/j.dsp.2016.04.005

Abstract

Machine hearing is an emerging research field that is analogous to machine vision in that it aims to equip computers with the ability to hear and recognise a variety of sounds. It is a key enabler of natural human-computer speech interfacing, as well as in areas such as automated security surveillance, environmental monitoring, smart homes/buildings/cities. Recent advances in machine learning allow current systems to accurately recognise a diverse range of sounds under controlled conditions. However doing so in real-world noisy conditions remains a challenging task. Several front-end feature extraction methods have been used for machine hearing, employing speech recognition features like MFCC and PLP, as well as image-like features such as AIM and SIF. The best choice of feature is found to be dependent upon the noise environment and machine learning techniques used. Machine learning methods such as deep neural networks have been shown capable of inferring discriminative classification rules from less structured front-end features in related domains. In the machine hearing field, spectrogram image features have recently shown good performance for noise-corrupted classification using deep neural networks. However there are many methods of extracting features from spectrograms. This paper explores a novel data-driven feature extraction method that uses variance-based criteria to define spectral pooling of features from spectrograms. The proposed method, based on maximising the pooled spectral variance of foreground and background sound models, is shown to achieve very good performance for robust classification.

Item Type: Article
DOI/Identification number: 10.1016/j.dsp.2016.04.005
Uncontrolled keywords: Machine hearing; auditory event detection; robust auditory classification;
Subjects: T Technology > T Technology (General)
Divisions: Faculties > Sciences > School of Computing
Depositing User: Ian McLoughlin
Date Deposited: 19 Apr 2016 08:34 UTC
Last Modified: 29 May 2019 17:14 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/55016 (The current URI for this page, for reference purposes)
McLoughlin, Ian Vince: https://orcid.org/0000-0001-7111-2008
  • Depositors only (login required):

Downloads

Downloads per month over past year