To annotate or not? predicting performance drop under domain shift

C Baek, Y Jiang, A Raghunathan… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recently, Miller et al. showed that a model's in-distribution (ID) accuracy has a strong linear
correlation with its out-of-distribution (OOD) accuracy, on several OOD benchmarks, a …

被引用次数：80 相关文章所有 5 个版本

[PDF] arxiv.org

Neural unsupervised domain adaptation in NLP---a survey

A Ramponi, B Plank - arXiv preprint arXiv:2006.00632, 2020 - arxiv.org

Deep neural networks excel at learning from labeled data and achieve state-of-the-art
resultson a wide array of Natural Language Processing tasks. In contrast, learning from …

被引用次数：336 相关文章所有 7 个版本

[PDF] arxiv.org

Assessing generalization of SGD via disagreement

Y Jiang, V Nagarajan, C Baek, JZ Kolter - arXiv preprint arXiv:2106.13799, 2021 - arxiv.org

We empirically show that the test error of deep networks can be estimated by simply training
the same architecture on the same training set but with a different run of Stochastic Gradient …

被引用次数：130 相关文章所有 5 个版本

[PDF] ieee.org

The NLP cookbook: modern recipes for transformer based deep learning architectures

S Singh, A Mahmood - IEEE Access, 2021 - ieeexplore.ieee.org

In recent years, Natural Language Processing (NLP) models have achieved phenomenal
success in linguistic and semantic tasks like text classification, machine translation, cognitive …

被引用次数：140 相关文章所有 6 个版本

[PDF] neurips.cc

Detecting errors and estimating accuracy on unlabeled data with self-training ensembles

J Chen, F Liu, B Avci, X Wu… - Advances in Neural …, 2021 - proceedings.neurips.cc

When a deep learning model is deployed in the wild, it can encounter test data drawn from
distributions different from the training data distribution and suffer drop in performance. For …

被引用次数：72 相关文章所有 9 个版本

[PDF] mit.edu

Temporal effects on pre-trained models for language processing tasks

O Agarwal, A Nenkova - Transactions of the Association for …, 2022 - direct.mit.edu

Keeping the performance of language technologies optimal as time passes is of great
practical interest. We study temporal effects on model performance on downstream …

被引用次数：69 相关文章所有 8 个版本

[PDF] arxiv.org

Factkb: Generalizable factuality evaluation using language models enhanced with factual knowledge

S Feng, V Balachandran, Y Bai, Y Tsvetkov - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluating the factual consistency of automatically generated summaries is essential for the
progress and adoption of reliable summarization systems. Despite recent advances, existing …

被引用次数：40 相关文章所有 4 个版本

[PDF] arxiv.org

Generalization and personalization of mobile sensing-based mood inference models: an analysis of college students in eight countries

L Meegahapola, W Droz, P Kun, A De Götzen… - Proceedings of the …, 2023 - dl.acm.org

Mood inference with mobile sensing data has been studied in ubicomp literature over the
last decade. This inference enables context-aware and personalized user experiences in …

被引用次数：51 相关文章所有 14 个版本

[PDF] mlr.press

Predicting out-of-distribution error with the projection norm

Y Yu, Z Yang, A Wei, Y Ma… - … Conference on Machine …, 2022 - proceedings.mlr.press

We propose a metric—Projection Norm—to predict a model's performance on out-of-
distribution (OOD) data without access to ground truth labels. Projection Norm first uses …

被引用次数：44 相关文章所有 8 个版本

[PDF] arxiv.org

Investigating selective prediction approaches across several tasks in iid, ood, and adversarial settings

N Varshney, S Mishra, C Baral - arXiv preprint arXiv:2203.00211, 2022 - arxiv.org

In order to equip NLP systems with selective prediction capability, several task-specific
approaches have been proposed. However, which approaches work best across tasks or …

被引用次数：51 相关文章所有 5 个版本

高级搜索

QQ 群