Michael Stark, Jonathan Krause, Bojan Pepik, David Meger, James J Little, Bernt Schiele, Daphne Koller
British Machine Vision Conference (BMVC)
Basic-level object category recognition has made remarkable progress over the last decade, both in image-level categorization and bounding box localization settings [3]. More recently, the recognition of finer-grained, subordinate categories is receiving increased attention [1, 2, 4, 7, 8, 11, 12]. It is deemed challenging due to the need to capture subtle appearance differences between categories while at the same time maintaining robustness to intra-category variations induced by changes in pose and viewpoint. As a consequence, the focus of previous work has been mostly on object categories and methods that favor discrimination by strong local appearance cues (such as random color image patches for birds [12]) or global image statistics (such as color histograms for flowers [8]). Our paper goes beyond previous work on fine-grained categorization in two ways. First, in addition to exploring the task of fine-grained categorization itself, we suggest the use of fine-grained category predictions as an input for higher-level reasoning. This is based on the observation that fine-grained categories can encode, among other aspects, information about metric object sizes, which can in turn provide geometric constraints for scene-level reasoning. Accordingly, we focus our attention on rigid, geometric objects that can provide, if correctly categorized, reliable metric size estimates, and introduce a novel dataset 1 of fine-grained car types as a test bed for our approach (Fig. 1). This data set is annotated with 2D bounding boxes, viewpoint estimates, car types, and additionally includes metric object sizes (length, width, and height) for geometric reasoning …
M Stark, J Krause, B Pepik, D Meger, JJ Little… - International Journal of Robotics Research, 2011