Using English baits to catch Serbian multi-word terminology- 学术资源搜索

[PDF][PDF] Using English baits to catch Serbian multi-word terminology

C Krstev, B Šandrih, R Stanković… - Proceedings of the …, 2018 - aclanthology.org

C Krstev, B Šandrih, R Stanković, M Mladenović

Proceedings of the Eleventh International Conference on Language …, 2018•aclanthology.org

Abstract

In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a tool for word and chunk alignment. In this first experiment a source language is English, a target language is Serbian, a domain is Library and Information Science for which a bilingual terminological dictionary exists. Our term extractor is based on e-dictionaries and shallow parsing, and for word alignment we use GIZA++. At the end of procedure we included a supervised binary classifier that decides whether an extracted term is a valid domain term. The classifier was evaluated in a 5-fold cross validation setting on a slightly unbalanced dataset, maintaining average F-score of 89%. After conducting the experiment our system extracted 846 different Serbian domain phrases, containing 515 Serbian phrases that were not present in the existing domain terminology.

aclanthology.org

展开收起

被引用次数：11 相关文章所有 7 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

[PDF][PDF] Using English baits to catch Serbian multi-word terminology

引用