Agreement-on-the-line: Predicting the performance of neural networks under distribution shift

C Baek, Y Jiang, A Raghunathan… - Advances in Neural …, 2022 - proceedings.neurips.cc
Recently, Miller et al. showed that a model's in-distribution (ID) accuracy has a strong linear
correlation with its out-of-distribution (OOD) accuracy, on several OOD benchmarks, a …

Neural unsupervised domain adaptation in NLP---a survey

A Ramponi, B Plank - arXiv preprint arXiv:2006.00632, 2020 - arxiv.org
Deep neural networks excel at learning from labeled data and achieve state-of-the-art
resultson a wide array of Natural Language Processing tasks. In contrast, learning from …

Assessing generalization of SGD via disagreement

Y Jiang, V Nagarajan, C Baek, JZ Kolter - arXiv preprint arXiv:2106.13799, 2021 - arxiv.org
We empirically show that the test error of deep networks can be estimated by simply training
the same architecture on the same training set but with a different run of Stochastic Gradient …

The NLP cookbook: modern recipes for transformer based deep learning architectures

S Singh, A Mahmood - IEEE Access, 2021 - ieeexplore.ieee.org
In recent years, Natural Language Processing (NLP) models have achieved phenomenal
success in linguistic and semantic tasks like text classification, machine translation, cognitive …

Detecting errors and estimating accuracy on unlabeled data with self-training ensembles

J Chen, F Liu, B Avci, X Wu… - Advances in Neural …, 2021 - proceedings.neurips.cc
When a deep learning model is deployed in the wild, it can encounter test data drawn from
distributions different from the training data distribution and suffer drop in performance. For …

Temporal effects on pre-trained models for language processing tasks

O Agarwal, A Nenkova - Transactions of the Association for …, 2022 - direct.mit.edu
Keeping the performance of language technologies optimal as time passes is of great
practical interest. We study temporal effects on model performance on downstream …

Factkb: Generalizable factuality evaluation using language models enhanced with factual knowledge

S Feng, V Balachandran, Y Bai, Y Tsvetkov - arXiv preprint arXiv …, 2023 - arxiv.org
Evaluating the factual consistency of automatically generated summaries is essential for the
progress and adoption of reliable summarization systems. Despite recent advances, existing …

Generalization and personalization of mobile sensing-based mood inference models: an analysis of college students in eight countries

L Meegahapola, W Droz, P Kun, A De Götzen… - Proceedings of the …, 2023 - dl.acm.org
Mood inference with mobile sensing data has been studied in ubicomp literature over the
last decade. This inference enables context-aware and personalized user experiences in …

Predicting out-of-distribution error with the projection norm

Y Yu, Z Yang, A Wei, Y Ma… - … Conference on Machine …, 2022 - proceedings.mlr.press
We propose a metric—Projection Norm—to predict a model's performance on out-of-
distribution (OOD) data without access to ground truth labels. Projection Norm first uses …

Investigating selective prediction approaches across several tasks in iid, ood, and adversarial settings

N Varshney, S Mishra, C Baral - arXiv preprint arXiv:2203.00211, 2022 - arxiv.org
In order to equip NLP systems with selective prediction capability, several task-specific
approaches have been proposed. However, which approaches work best across tasks or …