McLoughlin, Ian Vince (2014) Super-Audible Voice Activity Detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22 (9). pp. 1424-1433. (doi:10.1109/TASLP.2014.2335055) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:48837)
The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided. | |
Official URL: http://dx.doi.org/10.1109/TASLP.2014.2335055 |
Abstract
In this paper, reflected sound of frequency just above the audible range is used to detect speech activity. The active signal used is inaudible to humans, readily generated by the typical audio circuitry and components found in mobile telephones, and is robust to background sounds such as nearby voices. In use, the system relies upon a wideband excitation signal emitted from a loudspeaker located near the lips, which reflects from the mouth region and is then captured by a nearby microphone. The state of the lip opening is evaluated periodically by tracking the resonance patterns in the reflected excitation signal. When the lips are open, deep and complex resonances are formed as energy propagates into and then reflects out from the open mouth and vocal tract, with resonance depth being related to the open lip area. When the lips are closed, these resonance patterns are absent. The presence of the resonances can thus serve as a low complexity detection measure. The technique is evaluated for multiple users in terms of sensitivity to source placement and sensor placement. Voice activity detection performance using this measure is further evaluated in the presence of realistic wideband acoustic background noise, as well as artificially added noise. The system is shown to be relatively insensitive to sensor placement, highly insensitive to background noise, and able to achieve greater than 90% voice activity detection accuracy. The technique is even suitable when a subject is whispering in the presence of much louder multi-speaker babble. The technique has potential for speech-based systems operating in high noise environments as well as in silent speech interfaces, whisper-input systems and voice prostheses for speech-impaired users.
Item Type: | Article |
---|---|
DOI/Identification number: | 10.1109/TASLP.2014.2335055 |
Uncontrolled keywords: | lip state detection, mouth state detection, silent speech interfaces, speech activity detection, voice activity detection, voice operated switch |
Subjects: | T Technology |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Depositing User: | Ian McLoughlin |
Date Deposited: | 25 Aug 2015 08:45 UTC |
Last Modified: | 05 Nov 2024 10:33 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/48837 (The current URI for this page, for reference purposes) |
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):