Neural data-to-text generation with LM-based text augmentation

E Chang, X Shen, D Zhu, V Demberg, H Su - arXiv preprint arXiv …, 2021 - arxiv.org
For many new application domains for data-to-text generation, the main obstacle in training
neural models consists of a lack of training data. While usually large numbers of instances …

Diversifying dialogue generation with non-conversational text

H Su, X Shen, S Zhao, X Zhou, P Hu, R Zhong… - arXiv preprint arXiv …, 2020 - arxiv.org
Neural network-based sequence-to-sequence (seq2seq) models strongly suffer from the low-
diversity problem when it comes to open-domain dialogue generation. As bland and generic …

Neural data-to-text generation via jointly learning the segmentation and correspondence

X Shen, E Chang, H Su, J Zhou, D Klakow - arXiv preprint arXiv …, 2020 - arxiv.org
The neural attention model has achieved great success in data-to-text generation tasks.
Though usually excelling at producing fluent text, it suffers from the problem of information …

Does the order of training samples matter? improving neural data-to-text generation with curriculum learning

E Chang, HS Yeh, V Demberg - arXiv preprint arXiv:2102.03554, 2021 - arxiv.org
Recent advancements in data-to-text generation largely take on the form of neural end-to-
end systems. Efforts have been dedicated to improving text generation systems by changing …

Jointly improving language understanding and generation with quality-weighted weak supervision of automatic labeling

E Chang, V Demberg, A Marin - arXiv preprint arXiv:2102.03551, 2021 - arxiv.org
Neural natural language generation (NLG) and understanding (NLU) models are data-
hungry and require massive amounts of annotated data to be competitive. Recent …

Learning fine-grained fact-article correspondence in legal cases

J Ge, Y Huang, X Shen, C Li… - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
Automatically recommending relevant law articles to a given legal case has attracted much
attention as it can greatly release human labor from searching over the large database of …

semipqa: A study on product question answering over semi-structured data

X Shen, G Barlacchi, M Del Tredici… - Proceedings of the …, 2022 - aclanthology.org
Product question answering (PQA) aims to automatically address customer questions to
improve their online shopping experience. Current research mainly focuses on finding …

Dart: A lightweight quality-suggestive data-to-text annotation tool

E Chang, J Caplinger, A Marin, X Shen… - arXiv preprint arXiv …, 2020 - arxiv.org
We present a lightweight annotation tool, the Data AnnotatoR Tool (DART), for the general
task of labeling structured data with textual descriptions. The tool is implemented as an …

[PDF][PDF] Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

PJ Lin, M Saeed, E Chang… - Proceedings of the 24th …, 2023 - isca-archive.org
Developing effective spoken language processing systems for low-resource languages
poses several challenges due to the lack of parallel data and limited resources for fine …

Deep latent-variable models for text generation

X Shen - arXiv preprint arXiv:2203.02055, 2022 - arxiv.org
Text generation aims to produce human-like natural language output for down-stream tasks.
It covers a wide range of applications like machine translation, document summarization …