作者
Richard Zanibbi, Kenny Davila, Andrew Kane, Frank Wm Tompa
发表日期
2016/7/7
研讨会论文
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
页码范围
145-154
出版商
ACM
简介
When using a mathematical formula for search (query-by-expression), the suitability of retrieved formulae often depends more upon symbol identities and layout than deep mathematical semantics. Using a Symbol Layout Tree representation for formula appearance, we propose the Maximum Subtree Similarity (MSS) for ranking formulae based upon the subexpression whose symbols and layout best match a query formula. Because MSS is too expensive to apply against a complete collection, the Tangent-3 system first retrieves expressions using an inverted index over symbol pair relationships, ranking hits using the Dice coefficient; the top-k formulae are then re-ranked by MSS. Tangent-3 obtains state-of-the-art performance on the NTCIR-11 Wikipedia formula retrieval benchmark, and is efficient in terms of both space and time. Retrieval systems for other graphical forms, including chemical diagrams, flowcharts …
引用总数
20162017201820192020202120222023202445610916793
学术搜索中的文章
R Zanibbi, K Davila, A Kane, FW Tompa - Proceedings of the 39th International ACM SIGIR …, 2016