Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention

Sun, Zhongtian, Harit, Anoushka, Cristea, Alexandra I., Yu, Jialin, Moubayed, Noura Al, Shi, Lei (2023) Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention. In: 2022 IEEE International Conference on Big Data. IEEE ISBN 978-1-6654-8045-1. (doi:10.1109/BigData55660.2022.10020791) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:108674)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided.
Contact us about this publication
Official URL: https://doi.org/10.1109/BigData55660.2022.10020791

Abstract

Medical visual question answering (Med-VQA) is to answer medical questions based on clinical images provided. This field is still in its infancy due to the complexity of the trio formed of questions, multimodal features and expert knowledge. In this paper, we tackle, a ’myth’ in the Natural Language Processing area - that unimodal bias is always considered undesirable in learning models. Additionally, we study the effect of integrating a novel dynamic attention mechanism into such models, inspired by a recent graph deep learning study.Unlike traditional attention, dynamic attention scores are conditioned on different query words in a question and thus enhance the representation learning ability of texts. We propose that some questions are answered more accurately with a reinforcement of question embedding after fusing multimodal features. Extensive experiments have been implemented on the VQA-RAD datasets and demonstrate that our proposed model, reinforCe unimOdal dynamiC Attention (COCA), outperforms the state-of-the-art methods overall and performs competitively at open-ended question answering.

Item Type:	Conference proceeding
DOI/Identification number:	10.1109/BigData55660.2022.10020791
Subjects:	Q Science > Q Science (General) > Q335 Artificial intelligence
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Zhongtian Sun
Date Deposited:	06 Feb 2025 16:19 UTC
Last Modified:	29 Apr 2026 10:32 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/108674 (The current URI for this page, for reference purposes)

University of Kent Author Information

Sun, Zhongtian.

Creator's ORCID:	https://orcid.org/0000-0003-0489-5203
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.