Image Classification with CNN-based Fisher Vector Coding

Song, Yan and Hong, Xinhai and McLoughlin, Ian and Dai, Lirong (2017) Image Classification with CNN-based Fisher Vector Coding. In: 2016 Visual Communications and Image Processing (VCIP). IEEE. ISBN 978-1-5090-5317-9. E-ISBN 978-1-5090-5316-2. (doi:10.1109/VCIP.2016.7805494) (KAR id:57115)

PDF Author's Accepted Manuscript Language: English This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Download this file (PDF/283kB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: https://dx.doi.org/10.1109/VCIP.2016.7805494
Additional URLs: http://www.vcip2016.org/index.asp

Abstract

Fisher vector coding methods have been demonstrated to be effective for image classification. With the help of convolutional neural networks (CNN), several Fisher vector coding methods have shown state-of-the-art performance by adopting the activations of a single fully-connected layer as region features. These methods generally exploit a diagonal Gaussian mixture model (GMM) to describe the generative process of region features. However, it is difficult to model the complex distribution of high-dimensional feature space with a limited number of Gaussians obtained by unsupervised learning. Simply increasing the number of Gaussians turns out to be inefficient and computationally impractical.

To address this issue, we re-interpret a pre-trained CNN as the probabilistic discriminative model, and present a CNN based Fisher vector coding method, termed CNN-FVC. Specifically, activations of the intermediate fully-connected and output soft-max layers are exploited to derive the posteriors, mean and covariance parameters for Fisher vector coding implicitly. To further improve the efficiency, we convert the pre-trained CNN to a fully convolutional one to extract the region features. Extensive experiments have been conducted on two standard scene benchmarks (i.e. SUN397 and MIT67) to evaluate the effectiveness of the proposed method. Classification accuracies of 60.7% and 82.1% are achieved on the SUN397 and MIT67 benchmarks respectively, outperforming previous state-of-the-art approaches. Furthermore, the method is complementary to GMM-FVC methods, allowing a simple fusion scheme to further improve performance to 61.1% and 83.1% respectively.

Item Type:	Book section
DOI/Identification number:	10.1109/VCIP.2016.7805494
Additional information:	Received a best paper award
Uncontrolled keywords:	Image Classification, Convolutional Neural Network, Gaussian Mixture Model, Fisher Vector Coding
Subjects:	T Technology
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Ian McLoughlin
Date Deposited:	03 Dec 2016 16:30 UTC
Last Modified:	20 May 2025 10:19 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/57115 (The current URI for this page, for reference purposes)

University of Kent Author Information

McLoughlin, Ian.

Creator's ORCID:	https://orcid.org/0000-0001-7111-2008
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.