MORE: Toward Improving Author Name Disambiguation in Academic Knowledge Graphs

J Gong, X Fang, J Peng, Y Zhao, J Zhao… - International Journal of …, 2024 - Springer
J Gong, X Fang, J Peng, Y Zhao, J Zhao, C Wang, Y Li, J Zhang, S Drew
International Journal of Machine Learning and Cybernetics, 2024Springer
Author name disambiguation (AND) is a fundamental task in knowledge alignment for
building a knowledge graph network or an online academic search system. Existing AND
algorithms tend to cause over-splitting and over-merging problems of papers, severely
jeopardizing the performance of downstream tasks. In this paper, we demonstrate the
problem of paper over-splitting and over-merging when constructing an academic
knowledge graph. To address the problems, we systematically investigate and propose a …
Abstract
Author name disambiguation (AND) is a fundamental task in knowledge alignment for building a knowledge graph network or an online academic search system. Existing AND algorithms tend to cause over-splitting and over-merging problems of papers, severely jeopardizing the performance of downstream tasks. In this paper, we demonstrate the problem of paper over-splitting and over-merging when constructing an academic knowledge graph. To address the problems, we systematically investigate and propose a unified architecture, MORE, which utilizes LightGBM and HAC FOR paper clusteRing as well as HGAT for both cluster alignmEnt and knowledge graph representation learning. Specifically, we first propose a novel representation learning method which leverages OAG-BERT to learn paper entity embedding and utilizes SimCSE to regularizes pre-trained embedding anisotropic space. We then apply LightGBM to calculate the similarity matrix of papers through entity embedding. We also use hierarchical agglomerative clustering (HAC) for grouping clusters to alleviate over-merging. Finally, considering co-author relationships, we improve the HGAT model using hard-cross graph attention mechanism to generate semantic and structural embedding. Experimental results on two large real-world datasets show that our proposed method achieves 6%16% improvement against the baseline models on F1-score.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果