作者
Benjamin Z Yao, Xiong Yang, Liang Lin, Mun Wai Lee, Song-Chun Zhu
发表日期
2010/6/17
期刊
Proceedings of the IEEE
卷号
98
期号
8
页码范围
1485-1508
出版商
IEEE
简介
In this paper, we present an image parsing to text description (I2T) framework that generates text descriptions of image and video content based on image understanding. The proposed I2T framework follows three steps: 1) input images (or video frames) are decomposed into their constituent visual patterns by an image parsing engine, in a spirit similar to parsing sentences in natural language; 2) the image parsing results are converted into semantic representation in the form of Web ontology language (OWL), which enables seamless integration with general knowledge bases; and 3) a text generation engine converts the results from previous steps into semantically meaningful, human readable, and query-able text reports. The centerpiece of the I2T framework is an and-or graph (AoG) visual knowledge representation, which provides a graphical representation serving as prior knowledge for representing diverse …
引用总数
2009201020112012201320142015201620172018201920202021202220232024181727283643273030192436322213
学术搜索中的文章
BZ Yao, X Yang, L Lin, MW Lee, SC Zhu - Proceedings of the IEEE, 2010