作者
Hugo Gonçalo Oliveira, João Ferreira, José Santos, Pedro Fialho, Ricardo Rodrigues, Luísa Coheur, Ana Alves
发表日期
2020/5
研讨会论文
Proceedings of The 12th Language Resources and Evaluation Conference
页码范围
5442-5449
简介
We present AIA-BDE, a corpus of 380 domain-oriented FAQs in Portuguese and their variations, ie, paraphrases or entailed questions, created manually, by humans, or automatically, with Google Translate. Its aims to be used as a benchmark for FAQ retrieval and automatic question-answering, but may be useful in other contexts, such as the development of task-oriented dialogue systems, or models for natural language inference in an interrogative context. We also report on two experiments. Matching variations with their original questions was not trivial with a set of unsupervised baselines, especially for manually created variations. Besides high performances obtained with ELMo and BERT embeddings, an Information Retrieval system was surprisingly competitive when considering only the first hit. In the second experiment, text classifiers were trained with the original questions, and tested when assigning each variation to one of three possible sources, or assigning them as out-of-domain. Here, the difference between manual and automatic variations was not so significant.
引用总数
20202021202220231242
学术搜索中的文章
HG Oliveira, J Ferreira, J Santos, P Fialho, R Rodrigues… - Proceedings of the Twelfth Language Resources and …, 2020