SECTOR: A neural model for coherent topic segmentation and classification

S Arnold, R Schneider, P Cudré-Mauroux… - Transactions of the …, 2019 - direct.mit.edu
When searching for information, a human reader first glances over a document, spots
relevant sections, and then focuses on a few sentences for resolving her intention. However …

Cross-lingual language model pretraining for retrieval

P Yu, H Fei, P Li - Proceedings of the Web Conference 2021, 2021 - dl.acm.org
Existing research on cross-lingual retrieval cannot take good advantage of large-scale
pretrained language models such as multilingual BERT and XLM. We hypothesize that the …

Quantifying engagement with citations on Wikipedia

T Piccardi, M Redi, G Colavizza, R West - Proceedings of The Web …, 2020 - dl.acm.org
Wikipedia is one of the most visited sites on the Web and a common source of information
for many users. As an encyclopedia, Wikipedia was not conceived as a source of original …

Controlled analyses of social biases in Wikipedia bios

A Field, CY Park, KZ Lin, Y Tsvetkov - … of the ACM Web Conference 2022, 2022 - dl.acm.org
Social biases on Wikipedia, a widely-read global platform, could greatly influence public
opinion. While prior research has examined man/woman gender bias in biography articles …

A large-scale characterization of how readers browse Wikipedia

T Piccardi, M Gerlach, A Arora, R West - ACM Transactions on the Web, 2023 - dl.acm.org
Despite the importance and pervasiveness of Wikipedia as one of the largest platforms for
open knowledge, surprisingly little is known about how people navigate its content when …

Language-agnostic topic classification for wikipedia

I Johnson, M Gerlach, D Sáez-Trumper - Companion Proceedings of the …, 2021 - dl.acm.org
A major challenge for many analyses of Wikipedia dynamics—eg, imbalances in content
quality, geographic differences in what content is popular, what types of articles attract more …

Learning entity-centric document representations using an entity facet topic model

C Wu, E Kanoulas, M de Rijke - Information Processing & Management, 2020 - Elsevier
Learning semantic representations of documents is essential for various downstream
applications, including text classification and information retrieval. Entities, as important …

Scalable recommendation of wikipedia articles to editors using representation learning

O Moskalenko, D Parra, D Saez-Trumper - arXiv preprint arXiv …, 2020 - arxiv.org
Wikipedia is edited by volunteer editors around the world. Considering the large amount of
existing content (eg over 5M articles in English Wikipedia), deciding what to edit next can be …

Crosslingual document embedding as reduced-rank ridge regression

M Josifoski, IS Paskov, HS Paskov, M Jaggi… - Proceedings of the …, 2019 - dl.acm.org
There has recently been much interest in extending vector-based word representations to
multiple languages, such that words can be compared across languages. In this paper, we …

[图书][B] Making Presentation Math Computable: A Context-Sensitive Approach for Translating LaTeX to Computer Algebra Systems

A Greiner-Petter - 2023 - library.oapen.org
This Open-Access-book addresses the issue of translating mathematical expressions from
LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially …