On efficient bayesian scene interpretation

E Jahangiri - 2016 - jscholarship.library.jhu.edu
2016jscholarship.library.jhu.edu
Scene understanding, including object recognition, is perhaps the most challenging task in
computer vision. Deep convolutional neural networks (CNNs) have received a flurry of
interest in the past few years due to their superior performance. However, deep networks are
computationally expensive and without efficient implementation on high performance
computing systems not as practical as older methods. Furthermore, CNNs do not benefit
from the human's visual selective attention and top-down contextual feedback connections …
Abstract
Scene understanding, including object recognition, is perhaps the most challenging task in computer vision. Deep convolutional neural networks (CNNs) have received a flurry of interest in the past few years due to their superior performance. However, deep networks are computationally expensive and without efficient implementation on high performance computing systems not as practical as older methods. Furthermore, CNNs do not benefit from the human's visual selective attention and top-down contextual feedback connections. The human visual system makes extensive use of contextual information to facilitate and refine object detections; object detection and recognition based only on intrinsic features of target objects is not usually sufficient for reliable inference. In this thesis, we use a model-based approach to incorporate top-down contextual information, and analyze scenes in a coarse-to-fine fashion inspired by the visual selective attention property. In addition to disambiguating object detection, the space of objects and their poses can be searched more efficiently by taking advantage of the contextual relations between different scene entities. We present a new approach to efficiently search the space of objects and their poses using a Bayesian method called``Entropy Pursuit'', where contextual relations between object instances and other scene entities are incorporated via a prior model. Using the entropy pursuit approach we collect bits of information about the scene sequentially by greedily selecting patches whose analysis provide the most informative in an information-theoretic sense. As proof of concept we use the entropy pursuit method for multi-category object recognition in table-setting scenes. We have investigated the possibility of generating a scene interpretation by processing only a fraction of patches from an input image. Our results confirm the hypothesis that we can identify an accurate interpretation by processing only a fraction of patches if the right patches are selected in the right order. We can save computation time by processing only a fraction of patches.
jscholarship.library.jhu.edu
以上显示的是最相近的搜索结果。 查看全部搜索结果