Part-level Attributes for Visual Recognition

[See more details in our ECCV'12 paper]

Overview

We introduce a new image representation for visual recognition, where an image is described based on the response maps of object part filters. The part filters are learned from existing datasets with object location annotations, using the deformable part-based models (DPM) trained by latent SVM. Contrast to the previous visual recognition approaches that adopted object-level detections as feature inputs, we harness filter responses of object parts, which enable a richer and finer-grained representation. Through extensive experiments on several scene recognition benchmarks, we demonstrate that our approach outperforms several state-of-the-art methods.

To speed up the process of computing the response maps, we adopt sparse coding to generate a small number of base filters, which can be used to nicely reconstruct the large number of original part filters. As a result, we only need to compute the response maps of the base filters, and the maps of the original part filters can be efficiently reconstructed. This enables much faster computation of our proposed representation without hurting recognition performance.

Figure: Illustration of our approach. A large set of object part filters are firstly generated based on existing object labels using the DPM (object class names are shown below the part filters and sub-image exemplars). A small set of base filters are then learned by sparse coding, which are applied to an input image at multiple scales. The response maps of the base filters are used to quickly reconstruct the maps of the original part filters. The reconstructed filter maps are consolidated with a spatial pyramid of three levels to produce the final representation, which simply concatenates the max response values of each filter in each image scale and spatial grid.

Related Publications

Yingbin Zheng, Yu-Gang Jiang, Xiangyang Xue, Learning Hybrid Part Filters for Scene Recognition, European Conference on Computer Vision (ECCV), Firenze, Italy, Oct. 2012.

This paper proposed the idea of using part filters to form an attribute feature. The sparse coding part of the above framework is unpublished.

Source Codes

Click here to download. (Written by Jian Tu and Yingbin Zheng)