查看文章

arxiv.org 中的 [PDF]

Data Transformation to Construct a Dataset for Generating Entity-Relationship Model from Natural Language

作者

Zhenwen Li, Jian-Guang Lou, Tao Xie

发表日期

2023/12/21

期刊

arXiv preprint arXiv:2312.13694

简介

In order to reduce the manual cost of designing ER models, recent approaches have been proposed to address the task of NL2ERM, i.e., automatically generating entity-relationship (ER) models from natural language (NL) utterances such as software requirements. These approaches are typically rule-based ones, which rely on rigid heuristic rules; these approaches cannot generalize well to various linguistic ways of describing the same requirement. Despite having better generalization capability than rule-based approaches, deep-learning-based models are lacking for NL2ERM due to lacking a large-scale dataset. To address this issue, in this paper, we report our insight that there exists a high similarity between the task of NL2ERM and the increasingly popular task of text-to-SQL, and propose a data transformation algorithm that transforms the existing data of text-to-SQL into the data of NL2ERM. We apply our data transformation algorithm on Spider, one of the most popular text-to-SQL datasets, and we also collect some data entries with different NL types, to obtain a large-scale NL2ERM dataset. Because NL2ERM can be seen as a special information extraction (IE) task, we train two state-of-the-art IE models on our dataset. The experimental results show that both the two models achieve high performance and outperform existing baselines.

学术搜索中的文章

Data Transformation to Construct a Dataset for Generating Entity-Relationship Model from Natural Language

Z Li, JG Lou, T Xie - arXiv preprint arXiv:2312.13694, 2023