Annotation error detection: Analyzing the past and present for a more coherent future

JC Klie, B Webber, I Gurevych - Computational Linguistics, 2023 - direct.mit.edu
Annotated data is an essential ingredient in natural language processing for training and
evaluating machine learning models. It is therefore very desirable for the annotations to be …

Outlier detection for improved data quality and diversity in dialog systems

S Larson, A Mahendran, A Lee, JK Kummerfeld… - arXiv preprint arXiv …, 2019 - arxiv.org
In a corpus of data, outliers are either errors: mistakes in the data that are counterproductive,
or are unique: informative samples that improve model robustness. Identifying outliers can …

[HTML][HTML] Learning latent parameters without human response patterns: Item response theory with artificial crowds

JP Lalor, H Wu, H Yu - Proceedings of the Conference on Empirical …, 2019 - ncbi.nlm.nih.gov
Abstract Incorporating Item Response Theory (IRT) into NLP tasks can provide valuable
information about model performance and behavior. Traditionally, IRT models are learned …

Leitner-Guided Memory Replay for Cross-lingual Continual Learning

M M'hamdi, J May - Proceedings of the 2024 Conference of the …, 2024 - aclanthology.org
Cross-lingual continual learning aims to continuously fine-tune a downstream model on
emerging data from new languages. One major challenge in cross-lingual continual learning …

ActiveAED: A human in the loop improves annotation error detection

L Weber, B Plank - arXiv preprint arXiv:2305.20045, 2023 - arxiv.org
Manually annotated datasets are crucial for training and evaluating Natural Language
Processing models. However, recent work has discovered that even widely-used benchmark …

Uncovering Misattributed Suicide Causes through Annotation Inconsistency Detection in Death Investigation Notes

S Wang, Y Zhou, Z Han, C Tao, Y Xiao, Y Ding… - arXiv preprint arXiv …, 2024 - arxiv.org
Data accuracy is essential for scientific research and policy development. The National
Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns …

Generalizable Error Modeling for Search Relevance Data Annotation Tasks

H Peters, A Hashemi, J Rae - arXiv preprint arXiv:2310.05286, 2023 - arxiv.org
Human data annotation is critical in shaping the quality of machine learning (ML) and
artificial intelligence (AI) systems. One significant challenge in this context is posed by …

Learning latent characteristics of data and models using item response theory

JP Lalor - 2020 - scholarworks.umass.edu
A supervised machine learning model is trained with a large set of labeled training data, and
evaluated on a smaller but still large set of test data. Especially with deep neural networks …

Improving Natural Language Dataset Annotation Quality and Efficiency

JC Klie - tuprints.ulb.tu-darmstadt.de
Annotated data is essential in many scientific disciplines, including natural language
processing, linguistics, language acquisition research, bioinformatics, healthcare, or the …

[PDF][PDF] Improving the Dark Web Classifier with Active Learning and Annotation Error Detection

P San Gil, R Pernisch, E Haspels - bnaic2023.tudelft.nl
Active learning (AL) methods aim to reduce the human labeling effort by selecting the most
significant unlabeled samples. Annotation error detection (AED) strategies aim to identify …