Skip to main content
Kent Academic Repository

Exploring Audio Sensing in Detecting Social Interactions Using Smartphone Devices

Baker, Jon (2020) Exploring Audio Sensing in Detecting Social Interactions Using Smartphone Devices. Doctor of Philosophy (PhD) thesis, University of Kent,. (KAR id:83539)


In recent years, the fast proliferation of smartphones devices has provided powerful and portable methodologies for integrating sensing systems which can run continuously and provide feedback in real-time. The mobile crowd-sensing of human behaviour is an emerging computing paradigm that offers a challenge of sensing everyday social interactions performed by people who carry smartphone devices upon themselves. Typical smartphone sensors and the mobile crowd-sensing paradigm compose a process where the sensors present, such as the microphone, are used to infer social relationships between people in diverse social settings, where environmental factors can be dynamic and the infrastructure of buildings can vary.

The typical approaches in detecting social interactions between people consider the use of co-location as a proxy for real-world interactions. Such approaches can under-perform in challenging situations where multiple social interactions can occur within close proximity to each other, for example when people are in a queue at the supermarket but not a part of the same social interaction. Other approaches involve a limitation where all participants of a social interaction must carry a smartphone device with themselves at all times and each smartphone must have the sensing app installed. The problem here is the feasibility of the sensing system, which relies heavily on each participant's smartphone acting as nodes within a social graph, connected together with weighted edges of proximity between the devices; when users uninstall the app or disable background sensing, the system is unable to accurately determine the correct number of participants.

In this thesis, we present two novel approaches to detecting co-located social interac- tions using smartphones. The first relies on the use of WiFi signals and audio signals

to distinguish social groups interacting within a few meters from each other with 88% precision. We orchestrated preliminary experiments using WiFi as a proxy for co-location between people who are socially interacting. Initial results showed that in more challenging scenarios, WiFi is not accurate enough to determine if people are socially interacting within the same social group. We then made use of audio as a second modality to capture the sound patterns of conversations to identify and segment social groups within close proximity to each other. Through a range of real-world experiments (social interactions in meeting scenarios, coffee shop scenarios, conference scenarios), we demonstrate a technique that utilises WiFi fingerprinting, along with sound fingerprinting to identify these social groups. We built a system which performs well, and then optimized the power consumption and improved the performance to 88% precision in the most challenging scenarios using duty cycling and data averaging techniques.

The second approach explores the feasibility of detecting social interactions without the need of all social contacts to carry a social sensing device. This work explores the use of supervised and unsupervised Deep Learning techniques before concluding on the use of an Autoencoder model to perform a Speaker Identification task. We demonstrate how machine learning can be used with the audio data collected from a singular device as a speaker identification framework. Speech from people is used as the input to our Autoencoder model and then classified against a list of "social contacts" to determine if the user has spoken a person before or not. By doing this, the system can count the number of social contacts belonging to the user, and develop a database of common social contacts. Through the use 100 randomly-generated social conversations and the use of state-of-the-art Deep Learning techniques, we demonstrate how this system can accurately distinguish new and existing speakers from a data set of voices, to count the number of daily social interactions a user encounters with a precision of 75%. We then optimize the model using Hyperparameter Optimization to ensure that the model is most optimal for the task. Unlike most systems in the literature, this approach would work without the need to modify the existing infrastructure of a building, and without all participants needing to install the same app

Item Type: Thesis (Doctor of Philosophy (PhD))
Thesis advisor: Efstratious, Christos
Uncontrolled keywords: sensing smartphones android social interactions
Subjects: Q Science
T Technology
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Engineering and Digital Arts
SWORD Depositor: System Moodle
Depositing User: System Moodle
Date Deposited: 19 Oct 2020 10:10 UTC
Last Modified: 09 Dec 2022 05:10 UTC
Resource URI: (The current URI for this page, for reference purposes)

University of Kent Author Information

Baker, Jon.

Creator's ORCID:
CReDIT Contributor Roles:
  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.