Scalable image search based on visual similarity has been an active topic of research in recent years.
State-of-the-art solutions often use hashing methods to embed high-dimensional image features into Hamming space,
where search can be performed in real-time based on Hamming distance of compact hash codes. Unlike traditional metrics
(e.g., Euclidean) that offer continuous distances, the Hamming distances are discrete integer values. As a consequence,
there are often a large number of images sharing equal Hamming distances to a query, which largely hurts search results
where fine-grained ranking is very important.
This work introduces an approach that enables query-adaptive ranking of the returned images with equal Hamming distances to the queries. This is achieved by firstly offline learning bitwise weights of the hash codes for a diverse set of predefined semantic concept classes. We formulate the weight learning process as a quadratic programming problem that minimizes intra-class distance while preserving inter-class relationship captured by original raw image features. Query-adaptive weights are then computed online by evaluating the proximity between a query and the semantic concept classes. With the query-adaptive bitwise weights, returned images can be easily ordered by weighted Hamming distance at a finer-grained hash code level rather than the original Hamming distance level. Experiments on a Flickr image dataset show clear improvements.
Figure 1: An illustration of the proposed query-adaptive image search approach, using an example query represented by a 12-bit hash code. Traditionally hashing-based search results are ordered by integer Hamming distance, which is not ideal since many different hash codes share the same distance to the query. For instance, in this example there are 12 hash codes having Hamming distance 1 (each differs from the query in one bit). The proposed approach is able to order results at finer-grained hash code level. As exemplified over the images with Hamming distance 1 to the query, we propose a way to differentiate the ranking of the 12 different hash codes, where the order is online determined adaptively for each query.
Figure 2: Per-category performance comparison on NUS-WIDE Flickr Image Dataset, using 48-bit codes learned by Deep Blief Networks.
8,000 randomly selected queries are used, which are grouped into 81 categories based on their associated labels.