Skip to main content
Kent Academic Repository

Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention

Sun, Zhongtian, Harit, Anoushka, Cristea, Alexandra I., Yu, Jialin, Moubayed, Noura Al, Shi, Lei (2023) Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention. In: 2022 IEEE International Conference on Big Data. . pp. 5352-5360. IEEE ISBN 978-1-6654-8045-1. (doi:10.1109/BigData55660.2022.10020791) (The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided) (KAR id:108674)

The full text of this publication is not currently available from this repository. You may be able to access a copy if URLs are provided. (Contact us about this Publication)
Official URL:
https://doi.org/10.1109/BigData55660.2022.10020791

Abstract

Medical visual question answering (Med-VQA) is to answer medical questions based on clinical images provided. This field is still in its infancy due to the complexity of the trio formed of questions, multimodal features and expert knowledge. In this paper, we tackle, a ’myth’ in the Natural Language Processing area - that unimodal bias is always considered undesirable in learning models. Additionally, we study the effect of integrating a novel dynamic attention mechanism into such models, inspired by a recent graph deep learning study.Unlike traditional attention, dynamic attention scores are conditioned on different query words in a question and thus enhance the representation learning ability of texts. We propose that some questions are answered more accurately with a reinforcement of question embedding after fusing multimodal features. Extensive experiments have been implemented on the VQA-RAD datasets and demonstrate that our proposed model, reinforCe unimOdal dynamiC Attention (COCA), outperforms the state-of-the-art methods overall and performs competitively at open-ended question answering.

Item Type: Conference or workshop item (Proceeding)
DOI/Identification number: 10.1109/BigData55660.2022.10020791
Subjects: Q Science > Q Science (General) > Q335 Artificial intelligence
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Zhongtian Sun
Date Deposited: 06 Feb 2025 16:19 UTC
Last Modified: 10 Feb 2025 22:13 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/108674 (The current URI for this page, for reference purposes)

University of Kent Author Information

  • Depositors only (login required):

Total unique views of this page since July 2020. For more details click on the image.