A Survey of Deep Learning Solutions for Multimedia Visual Content Analysis

Nadeem, Muhammad Shahroz, Franqueira, Virginia N.L., Zhai, Xiaojun, Kurugollu, Fatih (2019) A Survey of Deep Learning Solutions for Multimedia Visual Content Analysis. IEEE Access, 7 . pp. 84003-84019. ISSN 2169-3536. (doi:10.1109/access.2019.2924733) (KAR id:77168)

PDF Publisher pdf Language: English This work is licensed under a Creative Commons Attribution 4.0 International License.
Download this file (PDF/6MB)
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: https://dx.doi.org/10.1109/access.2019.2924733

Abstract

The increasing use of social media networks on handheld devices, especially smartphones with powerful built-in cameras, and the widespread availability of fast and high bandwidth broadband connections, added to the popularity of cloud storage, is enabling the generation and distribution of massive volumes of digital media, including images and videos. Such media is full of visual information and holds immense value in today's world. The volume of data involved calls for automated visual content analysis systems able to meet the demands of practice in terms of efficiency and effectiveness. Deep learning (DL) has recently emerged as a prominent technique for visual content analysis. It is data-driven in nature and provides automatic end-to-end learning solutions without the need to rely explicitly on predefined handcrafted feature extractors. Another appealing characteristic of DL solutions is the performance they can achieve, once the network is trained, under practical constraints. This paper identifies eight problem domains which require analysis of visual artifacts in multimedia. It surveys the recent, authoritative, and the best performing DL solutions and lists the datasets used in the development of these deep methods for the identified types of visual analysis problems. This paper also discusses the challenges that the DL solutions face which can compromise their reliability, robustness, and accuracy for visual content analysis.

Item Type:	Article
DOI/Identification number:	10.1109/access.2019.2924733
Uncontrolled keywords:	Visual content analysis, Deep Learning, Machine Learning, dataset
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Virginia Franqueira
Date Deposited:	16 Oct 2019 12:48 UTC
Last Modified:	20 May 2025 10:24 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/77168 (The current URI for this page, for reference purposes)

University of Kent Author Information

Franqueira, Virginia N.L..

Creator's ORCID:	https://orcid.org/0000-0003-1332-9115
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.