… English and evaluate it under different generalization regimes… all recent years of the WMT Metrics Shared task (2017 to 2019, … We evaluate the agreement between the automatic metrics …
… We also provide examples containing an input coupled with correct and incorrect generations and show that distinguishing them is a nuanced task requiring knowledge about the …
… sample-based evaluationmetrics for GANs, and address the problem of how to evaluate the evaluationmetrics. We start with a few necessary conditions for metrics to produce …
… generationevaluationmetrics exhibit counter-intuitive behaviour. Although the main goal in language generation tasks … in § 3.1 shows that metrics have a preference towards machine-…
N Mulla, P Gharpure - Progress in Artificial Intelligence, 2023 - Springer
… generation techniques broadly based on three different use-cases. We analyze the datasets and metrics used in question generation for … the evaluationmetrics for automated evaluation …
… We propose a living benchmark called GEM (Generation, Evaluation, and Metrics) that aims to … in-depth evaluation of model outputs across human and automatic evaluation that aims to …
… two new metrics for evaluating generative models in the class-conditional image generation … We provide an extensive empirical evaluation, comparing the metrics to their unconditional …
… We consider the following crowdsourced datasets, which target utterance generation for spoken dialogue systems. Table 1 shows the number of system outputs for each dataset. Each …
… future approaches to explainable evaluationmetrics and their evaluation. We hope that … evaluationmetrics and, mediately, also contribute to better and more transparent text generation …