In a corpus of data, outliers are either errors: mistakes in the data that are counterproductive, or are unique: informative samples that improve model robustness. Identifying outliers can …
JP Lalor, H Wu, H Yu - Proceedings of the Conference on Empirical …, 2019 - ncbi.nlm.nih.gov
Abstract Incorporating Item Response Theory (IRT) into NLP tasks can provide valuable information about model performance and behavior. Traditionally, IRT models are learned …
M M'hamdi, J May - Proceedings of the 2024 Conference of the …, 2024 - aclanthology.org
Cross-lingual continual learning aims to continuously fine-tune a downstream model on emerging data from new languages. One major challenge in cross-lingual continual learning …
L Weber, B Plank - arXiv preprint arXiv:2305.20045, 2023 - arxiv.org
Manually annotated datasets are crucial for training and evaluating Natural Language Processing models. However, recent work has discovered that even widely-used benchmark …
Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns …
H Peters, A Hashemi, J Rae - arXiv preprint arXiv:2310.05286, 2023 - arxiv.org
Human data annotation is critical in shaping the quality of machine learning (ML) and artificial intelligence (AI) systems. One significant challenge in this context is posed by …
A supervised machine learning model is trained with a large set of labeled training data, and evaluated on a smaller but still large set of test data. Especially with deep neural networks …
Annotated data is essential in many scientific disciplines, including natural language processing, linguistics, language acquisition research, bioinformatics, healthcare, or the …
P San Gil, R Pernisch, E Haspels - bnaic2023.tudelft.nl
Active learning (AL) methods aim to reduce the human labeling effort by selecting the most significant unlabeled samples. Annotation error detection (AED) strategies aim to identify …