Skip to main content

Super-Audible Voice Activity Detection

McLoughlin, Ian Vince (2014) Super-Audible Voice Activity Detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22 (9). pp. 1424-1433. (doi:10.1109/TASLP.2014.2335055) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:48837)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided.
Official URL:
http://dx.doi.org/10.1109/TASLP.2014.2335055

Abstract

In this paper, reflected sound of frequency just above the audible range is used to detect speech activity. The active signal used is inaudible to humans, readily generated by the typical audio circuitry and components found in mobile telephones, and is robust to background sounds such as nearby voices. In use, the system relies upon a wideband excitation signal emitted from a loudspeaker located near the lips, which reflects from the mouth region and is then captured by a nearby microphone. The state of the lip opening is evaluated periodically by tracking the resonance patterns in the reflected excitation signal. When the lips are open, deep and complex resonances are formed as energy propagates into and then reflects out from the open mouth and vocal tract, with resonance depth being related to the open lip area. When the lips are closed, these resonance patterns are absent. The presence of the resonances can thus serve as a low complexity detection measure. The technique is evaluated for multiple users in terms of sensitivity to source placement and sensor placement. Voice activity detection performance using this measure is further evaluated in the presence of realistic wideband acoustic background noise, as well as artificially added noise. The system is shown to be relatively insensitive to sensor placement, highly insensitive to background noise, and able to achieve greater than 90% voice activity detection accuracy. The technique is even suitable when a subject is whispering in the presence of much louder multi-speaker babble. The technique has potential for speech-based systems operating in high noise environments as well as in silent speech interfaces, whisper-input systems and voice prostheses for speech-impaired users.

Item Type: Article
DOI/Identification number: 10.1109/TASLP.2014.2335055
Uncontrolled keywords: lip state detection, mouth state detection, silent speech interfaces, speech activity detection, voice activity detection, voice operated switch
Subjects: T Technology
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Ian McLoughlin
Date Deposited: 25 Aug 2015 08:45 UTC
Last Modified: 17 Aug 2022 10:58 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/48837 (The current URI for this page, for reference purposes)

University of Kent Author Information

McLoughlin, Ian Vince.

Creator's ORCID: https://orcid.org/0000-0001-7111-2008
CReDIT Contributor Roles:
  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.