Skip to main content
Kent Academic Repository

Fisher Vector based CNN architecture for Image Classification

Song, Yan and Wang, Peiseng and Hong, Xinhai and McLoughlin, Ian (2018) Fisher Vector based CNN architecture for Image Classification. In: 2017 IEEE International Conference on Image Processing (ICIP). IEEE. ISBN 978-1-5090-2176-5. E-ISBN 978-1-5090-2175-8. (doi:10.1109/ICIP.2017.8296344) (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided) (KAR id:63217)

PDF Author's Accepted Manuscript
Language: English

Restricted to Repository staff only
Contact us about this Publication
[thumbnail of CNNFV_END2ENDver2.pdf]
Official URL:
http://dx.doi.org/10.1109/ICIP.2017.8296344

Abstract

In this paper, we tackle the representation learning problem for small scale fine-grained object recognition and scene classification tasks. Conventional bag of features(BoF) methods exploit hand-crafted frontend local features, and learn the representations via various machine learning techniques. Convolutional neural networks(CNN) directly learn the representation from raw images and benefit from joint optimization of network parameters in an end-to-end manner. However, the performance of existing representation learning methods is still unsatisfactory for the small-scale recognition tasks. To address this issue, we present a FV coding based CNN(FV-CNN) architecture. FV-CNN has three main advantages in that firstly it is able to exploit activations from the intermediate convolutional layer and a probabilistic discriminative model to derive the FV coding. Secondly, it takes advantage of the end-to-end back-propagation of the gradients to jointly optimize the whole learning process. Finally, it can learn a compact representation. When evaluated on benchmark datasets of fine grain object recognition (Caltech-CUB200), and scene classification (MIT67), accuracies of 88.0% and 82.2% are achieved.

Item Type: Book section
DOI/Identification number: 10.1109/ICIP.2017.8296344
Uncontrolled keywords: Image Classification, Visual Representation, Convolutional Neural Network, End-to-End Training
Subjects: T Technology
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Ian McLoughlin
Date Deposited: 04 Sep 2017 15:13 UTC
Last Modified: 05 Nov 2024 10:58 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/63217 (The current URI for this page, for reference purposes)

University of Kent Author Information

  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.