Robustness gym: Unifying the NLP evaluation landscape

K Goel, N Rajani, J Vig, S Tan, J Wu, S Zheng… - arXiv preprint arXiv …, 2021 - arxiv.org
Despite impressive performance on standard benchmarks, deep neural networks are often
brittle when deployed in real-world systems. Consequently, recent research has focused on …

Robustness Gym: Unifying the NLP Evaluation Landscape

K Goel, NF Rajani, J Vig, Z Taschdjian… - Proceedings of the …, 2021 - aclanthology.org
Despite impressive performance on standard benchmarks, natural language processing
(NLP) models are often brittle when deployed in real-world systems. In this work, we identify …

Robustness Gym: Unifying the NLP Evaluation Landscape

K Goel, N Rajani, J Vig, S Tan, J Wu, S Zheng… - arXiv e …, 2021 - ui.adsabs.harvard.edu
Despite impressive performance on standard benchmarks, deep neural networks are often
brittle when deployed in real-world systems. Consequently, recent research has focused on …

[PDF][PDF] Robustness Gym: Unifying the NLP Evaluation Landscape

K Goel, N Rajani, J Vig, Z Taschdjian, M Bansal… - NAACL-HLT …, 2021 - aclanthology.org
Despite impressive performance on standard benchmarks, natural language processing
(NLP) models are often brittle when deployed in real-world systems. In this work, we identify …