Schema matching using pre-trained language models

Y Zhang, A Floratou, J Cahoon… - 2023 IEEE 39th …, 2023 - ieeexplore.ieee.org
2023 IEEE 39th International Conference on Data Engineering (ICDE), 2023ieeexplore.ieee.org
Schema matching over relational data has been studied for more than two decades.
However, the state-of-the-art methods do not address key modern-day challenges
encountered in real customer scenarios, namely: 1) no access to the source (customer) data
due to privacy constraints, 2) target schema with a much larger number of entities and
attributes compared to the source schema, and 3) different but semantically equivalent entity
and attribute names in the source and target schemata. In this paper, we address these …
Schema matching over relational data has been studied for more than two decades. However, the state-of-the-art methods do not address key modern-day challenges encountered in real customer scenarios, namely: 1) no access to the source (customer) data due to privacy constraints, 2) target schema with a much larger number of entities and attributes compared to the source schema, and 3) different but semantically equivalent entity and attribute names in the source and target schemata. In this paper, we address these shortcomings. Using real-world customer schemata, we demonstrate that existing linguistic matching approaches have low accuracy. Next, we propose the Learned Schema Mapper (LSM), a novel linguistic schema matching system that leverages the natural language understanding capabilities of pre-trained language models to improve the overall accuracy. Combining this with active learning and a smart attribute selection strategy that selects the most informative attributes for users to label, LSM can significantly reduce the overall human labeling cost. Experimental results demonstrate that users can correctly match their full schema while saving as much as 81% of the labeling cost compared to manual labeling.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果