A minimally supervised approach for detecting and ranking document translation pairs

S Moran, V Lavrenko, M Osborne - … of the 51st Annual Meeting of …, 2013 - aclanthology.org

We introduce a scheme for optimally allocating a variable number of bits per LSH
hyperplane. Previous approaches assign a constant number of bits per hyperplane. This …

被引用次数：35 相关文章所有 9 个版本

[PDF] aclanthology.org

[PDF][PDF] Two-stage hashing for fast document retrieval

H Li, W Liu, H Ji - Proceedings of the 52nd Annual Meeting of the …, 2014 - aclanthology.org

This work fulfills sublinear time Nearest Neighbor Search (NNS) in massivescale document
collections. The primary contribution is to propose a two-stage unsupervised hashing …

被引用次数：26 相关文章所有 8 个版本

[PDF] mit.edu

An empirical study on crosslingual transfer in probabilistic topic models

S Hao, MJ Paul - Computational Linguistics, 2020 - direct.mit.edu

Probabilistic topic modeling is a common first step in crosslingual tasks to enable knowledge
transfer and extract multilingual features. Although many multilingual topic models have …

被引用次数：13 相关文章所有 6 个版本

[PDF] iospress.com

Large-scale semantic exploration of scientific literature using topic-based hashing algorithms

C Badenes-Olmedo, JL Redondo-Garcia… - Semantic …, 2020 - content.iospress.com

Searching for similar documents and exploring major themes covered across groups of
documents are common activities when browsing collections of scientific papers. This …

被引用次数：12 相关文章所有 10 个版本

[PDF] psu.edu

Efficient nearest-neighbor search in the probability simplex

K Krstovski, DA Smith, HM Wallach… - Proceedings of the 2013 …, 2013 - dl.acm.org

Document similarity tasks arise in many areas of information retrieval and natural language
processing. A fundamental question when comparing documents is which representation to …

被引用次数：21 相关文章所有 8 个版本

[PDF] aclanthology.org

[PDF][PDF] Online polylingual topic models for fast document translation detection

K Krstovski, DA Smith - Proceedings of the Eighth Workshop on …, 2013 - aclanthology.org

Many tasks in NLP and IR require efficient document similarity computations. Beyond their
common application to exploratory data analysis, latent variable topic models have been …

被引用次数：16 相关文章所有 7 个版本

[PDF] aclanthology.org

[PDF][PDF] Bootstrapping translation detection and sentence extraction from comparable corpora

K Krstovski, DA Smith - Proceedings of the 2016 Conference of …, 2016 - aclanthology.org

Most work on extracting parallel text from comparable corpora depends on linguistic
resources such as seed parallel documents or translation dictionaries. This paper presents a …

被引用次数：11 相关文章所有 3 个版本

[PDF] aclanthology.org

[PDF][PDF] Using term position similarity and language modeling for bilingual document alignment

TC Le, HT Vu, J Oberlander, O Bojar - Proceedings of the First …, 2016 - aclanthology.org

Abstract The WMT Bilingual Document Alignment Task requires systems to assign source
pages to their “translations”, in a big space of possible pairs. We present four methods: The …

被引用次数：11 相关文章所有 6 个版本

[PDF] psu.edu

Mining relational structure from millions of books: position paper

DA Smith, R Manmatha, J Allan - Proceedings of the 4th ACM workshop …, 2011 - dl.acm.org

Existing large-scale scanned book collections have many shortcomings for data-driven
research, from OCR of variable quality to the lack of accurate descriptive and structural …

被引用次数：9 相关文章所有 2 个版本

[PDF] psu.edu

Finding translations in scanned book collections

IZ Yalniz, R Manmatha - Proceedings of the 35th international ACM …, 2012 - dl.acm.org

This paper describes an approach for identifying translations of books in large scanned
book collections with OCR errors. The method is based on the idea that although individual …

被引用次数：7 相关文章所有 3 个版本

高级搜索

QQ 群