Skip to main content
Kent Academic Repository

A robust framework for acoustic scene classification

Pham, Lam Dang, McLoughlin, Ian Vince, Phan, Huy, Palaniappan, Ramaswamy (2019) A robust framework for acoustic scene classification. In: Interspeech 2019. . pp. 3634-3638. (doi:doi: 10.21437/Interspeech.2019-1841) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:91418)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided. (Contact us about this Publication)
Official URL:
https://www.isca-speech.org/archive/interspeech_20...

Abstract

Acoustic scene classification (ASC) using front-end timefrequency features and back-end neural network classifiers has

demonstrated good performance in recent years. However a profusion of systems has arisen to suit different tasks and datasets, utilising different feature and classifier types. This paper aims at a robust framework that can explore and utilise a range of different time-frequency features and neural networks,

either singly or merged, to achieve good classification performance. In particular, we exploit three different types of frontend time-frequency feature; log energy Mel filter, Gammatone filter and constant Q transform. At the back-end we evaluate effective a two-stage model that exploits a Convolutional

Neural Network for pre-trained feature extraction, followed by Deep Neural Network classifiers as a post-trained feature adaptation model and classifier. We also explore the use of a data augmentation technique for these features that effectively generates a variety of intermediate data, reinforcing model learning abilities, particularly for marginal cases. We assess performance on the DCASE2016 dataset, demonstrating good classification accuracies exceeding 90%, significantly outperforming the DCASE2016 baseline and highly competitive compared to state-of-the-art systems.

Item Type: Conference or workshop item (Proceeding)
DOI/Identification number: doi: 10.21437/Interspeech.2019-1841
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Palaniappan Ramaswamy
Date Deposited: 08 Nov 2021 11:20 UTC
Last Modified: 04 Mar 2024 16:51 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/91418 (The current URI for this page, for reference purposes)

University of Kent Author Information

Pham, Lam Dang.

Creator's ORCID:
CReDIT Contributor Roles:

McLoughlin, Ian Vince.

Creator's ORCID: https://orcid.org/0000-0001-7111-2008
CReDIT Contributor Roles:

Phan, Huy.

Creator's ORCID: https://orcid.org/0000-0003-4096-785X
CReDIT Contributor Roles:

Palaniappan, Ramaswamy.

Creator's ORCID: https://orcid.org/0000-0001-5296-8396
CReDIT Contributor Roles:
  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.