CNN-LTE: A Class of 1-X Pooling Convolutional Neural Networks on Label Tree Embeddings for Audio Scene Classification

Phan, Huy, Koch, Philipp, Hertel, Lars, Maass, Marco, Mazur, Radoslaw, Mertins, Alfred (2017) CNN-LTE: A Class of 1-X Pooling Convolutional Neural Networks on Label Tree Embeddings for Audio Scene Classification. In: 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings. . pp. 136-140. IEEE, New Orleans, USA ISBN 978-1-5090-4117-6. (doi:10.1109/ICASSP.2017.7952133) (KAR id:72673)

PDF Author's Accepted Manuscript Language: English
Download this file (PDF/250kB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: https://doi.org/10.1109/ICASSP.2017.7952133
Additional URLs: Publisher

Abstract

We present in this work an approach for audio scene classification. Firstly, given the label set of the scenes, a label tree is automatically constructed where the labels are grouped into meta-classes. This category taxonomy is then used in the feature extraction step in which an audio scene instance is transformed into a label tree embedding image. Elements of the image indicate the likelihoods that the scene instances belong to different meta-classes. A class of simple 1-X (i.e. 1-max, 1-mean, and 1-mix) pooling convolutional neural networks, which are tailored for the task at hand, are finally learned on top of the image features for scene recognition. Experimental results on the DCASE 2013 and DCASE 2016 datasets demonstrate the efficiency of the proposed method.

Item Type:	Conference or workshop item (Proceeding)
DOI/Identification number:	10.1109/ICASSP.2017.7952133
Uncontrolled keywords:	audio scene classification, convolutional neural network, label tree embedding, pooling
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Huy Phan
Date Deposited:	25 Feb 2019 15:23 UTC
Last Modified:	20 May 2025 10:23 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/72673 (The current URI for this page, for reference purposes)

University of Kent Author Information

Phan, Huy.

Creator's ORCID:	https://orcid.org/0000-0003-4096-785X
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.