On generalization in coreference resolution

S Toshniwal, P Xia, S Wiseman, K Livescu… - arXiv preprint arXiv …, 2021 - arxiv.org
While coreference resolution is defined independently of dataset domain, most models for
performing coreference resolution do not transfer well to unseen domains. We consolidate a …

Transformers go for the LOLs: Generating (humourous) titles from scientific abstracts end-to-end

Y Chen, S Eger - arXiv preprint arXiv:2212.10522, 2022 - arxiv.org
We consider the end-to-end abstract-to-title generation problem, exploring seven recent
transformer based models (including ChatGPT) fine-tuned on more than 30k abstract-title …

The CODI-CRAC 2021 shared task on anaphora, bridging, and discourse deixis in dialogue

S Khosla, J Yu, R Manuvinakurike, V Ng… - Proceedings of the …, 2021 - aclanthology.org
In this paper, we provide an overview of the CODI-CRAC 2021 Shared-Task: Anaphora
Resolution in Dialogue. The shared task focuses on detecting anaphoric relations in …

The CODI-CRAC 2022 shared task on anaphora, bridging, and discourse deixis in dialogue

J Yu, S Khosla, R Manuvinakurike, L Levin… - Proceedings of the …, 2022 - aclanthology.org
Abstract The CODI-CRAC 2022 Shared Task on Anaphora Resolution in Dialogues is the
second edition of an initiative focused on detecting different types of anaphoric relations in …

What's Hard in English RST Parsing? Predictive Models for Error Analysis

YJ Liu, T Aoyama, A Zeldes - arXiv preprint arXiv:2309.04940, 2023 - arxiv.org
Despite recent advances in Natural Language Processing (NLP), hierarchical discourse
parsing in the framework of Rhetorical Structure Theory remains challenging, and our …

Investigating failures to generalize for coreference resolution models

I Porada, A Olteanu, K Suleman, A Trischler… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Coreference resolution models are often evaluated on multiple datasets. Datasets vary,
however, in how coreference is realized--ie, how the theoretical concept of coreference is …

Challenges to evaluating the generalization of coreference resolution models: A measurement modeling perspective

I Porada, A Olteanu, K Suleman… - Findings of the …, 2024 - aclanthology.org
It is increasingly common to evaluate the same coreference resolution (CR) model on
multiple datasets. Do these multi-dataset evaluations allow us to draw meaningful …

GENTLE: A genre-diverse multilayer challenge set for English NLP and linguistic evaluation

T Aoyama, S Behzad, L Gessler, L Levine, J Lin… - arXiv preprint arXiv …, 2023 - arxiv.org
We present GENTLE, a new mixed-genre English challenge corpus totaling 17K tokens and
consisting of 8 unusual text types for out-of domain evaluation: dictionary entries, esports …

A Controlled Reevaluation of Coreference Resolution Models

I Porada, X Zou, JCK Cheung - arXiv preprint arXiv:2404.00727, 2024 - arxiv.org
All state-of-the-art coreference resolution (CR) models involve finetuning a pretrained
language model. Whether the superior performance of one CR model over another is due to …

Major Entity Identification: A Generalizable Alternative to Coreference Resolution

K Manikantan, S Toshniwal, M Tapaswi… - arXiv preprint arXiv …, 2024 - arxiv.org
The limited generalization of coreference resolution (CR) models has been a major
bottleneck in the task's broad application. Prior work has identified annotation differences …