Robust Sound Event Classification using Deep Neural Networks

McLoughlin, Ian Vince, Zhang, Hao-min, Xie, Zhi-Peng, Song, Yan, Xiao, Wei (2015) Robust Sound Event Classification using Deep Neural Networks. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 23 (3). pp. 540-552. ISSN 2329-9290. (doi:10.1109/TASLP.2015.2389618) (KAR id:51341)

PDF (Robust Sound Event Classification using Deep Neural Networks) Author's Accepted Manuscript Language: English
Download this file (PDF/2MB)
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: http://www.dx.doi.org/10.1109/TASLP.2015.2389618

Abstract

The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise.

This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.

Item Type:	Article
DOI/Identification number:	10.1109/TASLP.2015.2389618
Additional information:	Full text upload complies with journal requirements
Uncontrolled keywords:	Machine hearing Auditory event detection
Subjects:	T Technology > TK Electrical engineering. Electronics. Nuclear engineering > TK5101 Telecommunications > TK5102.9 Signal processing
Divisions:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Ian McLoughlin
Date Deposited:	02 Nov 2015 11:44 UTC
Last Modified:	08 Dec 2022 21:48 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/51341 (The current URI for this page, for reference purposes)

University of Kent Author Information

McLoughlin, Ian Vince.

Creator's ORCID:	https://orcid.org/0000-0001-7111-2008
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Download Statistics

Total unique views for this document in KAR since July 2020. For more details click on the image.