The objective of this paper is to improve large scale visual object retrieval for visual place recognition. Geo-localization based on a visual query is made difficult by plenty of non-distinctive features which commonly occur in imagery of urban environments, such as generic modern windows, doors, cars, trees, etc. The focus of this work is to adapt standard Hamming Embedding retrieval system to account for varying descriptor distinctiveness. To this end, we propose a novel method for efficiently estimating distinctiveness of all database descriptors, based on estimating local descriptor density everywhere in the descriptor space. In contrast to all competing methods, the (unsupervised) training time for our method (DisLoc) is linear in the number database descriptors and takes only a 100 s on a single CPU core for a 1 million image database. Furthermore, the added memory requirements are negligible (1 %).
The method is evaluated on standard publicly available large-scale place recognition benchmarks containing street-view imagery of Pittsburgh and San Francisco. DisLoc is shown to outperform all baselines, while setting the new state-of-the-art on both benchmarks. The method is compatible with spatial reranking, which further improves recognition results.
Finally, we also demonstrate that 7 % of the least distinctive features can be removed, therefore reducing storage requirements and improving retrieval speed, without any loss in place recognition accuracy.