Skip to main content

Super-Audible Voice Activity Detection

McLoughlin, Ian Vince (2014) Super-Audible Voice Activity Detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22 (9). pp. 1424-1433. (doi:10.1109/TASLP.2014.2335055) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided. (Contact us about this Publication)
Official URL
http://dx.doi.org/10.1109/TASLP.2014.2335055

Abstract

In this paper, reflected sound of frequency just above the audible range is used to detect speech activity. The active signal used is inaudible to humans, readily generated by the typical audio circuitry and components found in mobile telephones, and is robust to background sounds such as nearby voices. In use, the system relies upon a wideband excitation signal emitted from a loudspeaker located near the lips, which reflects from the mouth region and is then captured by a nearby microphone. The state of the lip opening is evaluated periodically by tracking the resonance patterns in the reflected excitation signal. When the lips are open, deep and complex resonances are formed as energy propagates into and then reflects out from the open mouth and vocal tract, with resonance depth being related to the open lip area. When the lips are closed, these resonance patterns are absent. The presence of the resonances can thus serve as a low complexity detection measure. The technique is evaluated for multiple users in terms of sensitivity to source placement and sensor placement. Voice activity detection performance using this measure is further evaluated in the presence of realistic wideband acoustic background noise, as well as artificially added noise. The system is shown to be relatively insensitive to sensor placement, highly insensitive to background noise, and able to achieve greater than 90% voice activity detection accuracy. The technique is even suitable when a subject is whispering in the presence of much louder multi-speaker babble. The technique has potential for speech-based systems operating in high noise environments as well as in silent speech interfaces, whisper-input systems and voice prostheses for speech-impaired users.

Item Type: Article
DOI/Identification number: 10.1109/TASLP.2014.2335055
Uncontrolled keywords: lip state detection, mouth state detection, silent speech interfaces, speech activity detection, voice activity detection, voice operated switch
Subjects: T Technology
Divisions: Faculties > Sciences > School of Computing
Depositing User: Ian McLoughlin
Date Deposited: 25 Aug 2015 08:45 UTC
Last Modified: 29 May 2019 14:39 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/48837 (The current URI for this page, for reference purposes)
McLoughlin, Ian Vince: https://orcid.org/0000-0001-7111-2008
  • Depositors only (login required):