Representing Nonspeech Audio Signals through Speech Classification Models

Phan, Huy, Hertel, Lars, Maass, Marco, Mazur, Radoslaw, Mertins, Alfred (2015) Representing Nonspeech Audio Signals through Speech Classification Models. In: 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015). ISCA, Dresden, Germany (KAR id:72688)

PDF Author's Accepted Manuscript Language: English
Download this file (PDF/337kB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: https://www.isca-speech.org/archive/interspeech_20...
Additional URLs: Publisher

Abstract

The human auditory system is very well matched to both human speech and environmental sounds. Therefore, the question arises whether human speech material may provide useful information for training systems for analyzing nonspeech audio signals, such as in a recognition task. To find out how similar nonspeech signals are to speech, we measure the closeness between target nonspeech signals and different basis speech categories via a speech classification model. The speech similarities are finally employed as a descriptor to represent the target signal. We further show that a better descriptor can be obtained by learning to organize the speech categories hierarchically with a tree structure. We conduct experiments for the audio event analysis application by using speech words from the TIMIT dataset to learn the descriptors for the audio events of the Freiburg-106 dataset. Our results on the event recognition task outperform those achieved by the best system even though a simple linear classifier is used. Furthermore, integrating the learned descriptors as an additional source leads to improved performance.

Item Type:	Conference proceeding
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Huy Phan
Date Deposited:	25 Feb 2019 16:38 UTC
Last Modified:	20 May 2025 10:23 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/72688 (The current URI for this page, for reference purposes)

University of Kent Author Information

Phan, Huy.

Creator's ORCID:	https://orcid.org/0000-0003-4096-785X
CReDIT Contributor Roles:

Depositors only (login required):

Total Views

Total unique views of this page since July 2020. For more details click on the image.