Towards more robust nlp system evaluation: Handling missing scores in benchmarks

A Himmi, E Irurozki, N Noiry, S Clemencon… - arXiv preprint arXiv …, 2023 - arxiv.org
The evaluation of natural language processing (NLP) systems is crucial for advancing the
field, but current benchmarking approaches often assume that all systems have scores …

Robustness and risk management via distributional dynamic programming

M Achab, G Neu - arXiv preprint arXiv:2112.15430, 2021 - arxiv.org
In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act
optimally in terms of expected long-term return by sequentially interacting with its …

[PDF][PDF] Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks

P Colombo, A Himmi, E Irurozki, N Noiry, S Clémençon - 2024 - hal.science
The evaluation of natural language processing (NLP) systems is crucial for advancing the
field, but current benchmarking approaches often assume that all systems have scores …

Selective Preference Aggregation

S Kadekodi, H McTavish, B Ustun - NeurIPS 2024 Workshop on Behavioral … - openreview.net
Many tasks in machine learning are shaped by procedures where items are ordered based
on the preferences of a group—whether for funding proposals, recommending products, or …