A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features

Xie, Zhipeng, McLoughlin, Ian, Zhang, Haomin, Song, Yan, Xiao, Wei (2016) A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features. Digital Signal Processing, 54 . pp. 119-128. ISSN 1051-2004. (doi:10.1016/j.dsp.2016.04.005) (KAR id:55016)

PDF (A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features) Author's Accepted Manuscript Language: English This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Download this file (PDF/747kB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
PDF (A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features) Author's Accepted Manuscript Language: English Restricted to Repository staff only

Official URL: http://dx.doi.org/10.1016/j.dsp.2016.04.005

Abstract

Machine hearing is an emerging research field that is analogous to machine vision in that it aims to equip computers with the ability to hear and recognise a variety of sounds. It is a key enabler of natural human–computer speech interfacing, as well as in areas such as automated security surveillance, environmental monitoring, smart homes/buildings/cities. Recent advances in machine learning allow current systems to accurately recognise a diverse range of sounds under controlled conditions. However doing so in real-world noisy conditions remains a challenging task. Several front–end feature extraction methods have been used for machine hearing, employing speech recognition features like MFCC and PLP, as well as image-like features such as AIM and SIF. The best choice of feature is found to be dependent upon the noise environment and machine learning techniques used. Machine learning methods such as deep neural networks have been shown capable of inferring discriminative classification rules from less structured front–end features in related domains. In the machine hearing field, spectrogram image features have recently shown good performance for noise-corrupted classification using deep neural networks. However there are many methods of extracting features from spectrograms. This paper explores a novel data-driven feature extraction method that uses variance-based criteria to define spectral pooling of features from spectrograms. The proposed method, based on maximising the pooled spectral variance of foreground and background sound models, is shown to achieve very good performance for robust classification.

Item Type:	Article
DOI/Identification number:	10.1016/j.dsp.2016.04.005
Uncontrolled keywords:	Machine hearing; auditory event detection; robust auditory classification
Subjects:	T Technology > T Technology (General)
Divisions:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Ian McLoughlin
Date Deposited:	19 Apr 2016 08:34 UTC
Last Modified:	05 Nov 2024 10:43 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/55016 (The current URI for this page, for reference purposes)

University of Kent Author Information

McLoughlin, Ian.

Creator's ORCID:	https://orcid.org/0000-0001-7111-2008
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.