Aksharantar: Open Indic-language transliteration datasets and models for the next billion users

Y Madhani, S Parthan, P Bedekar, G Nc… - Findings of the …, 2023 - aclanthology.org
Transliteration is very important in the Indian language context due to the usage of multiple
scripts and the widespread use of romanized inputs. However, few training and evaluation …

[PDF][PDF] Extracting bilingual terminologies from comparable corpora

A Aker, ML Paramita, R Gaizauskas - Proceedings of the 51st …, 2013 - aclanthology.org
In this paper we present a method for extracting bilingual terminologies from comparable
corpora. In our approach we treat bilingual term extraction as a classification problem. For …

[PDF][PDF] Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics.

K Gupta, M Choudhury, K Bali - LREC, 2012 - lrec.elra.info
This paper describes a method to mine Hindi-English transliteration pairs from online Hindi
song lyrics. The technique is based on the observations that lyrics are transliterated word-by …

Machine transliteration and transliterated text retrieval: a survey

DK Prabhakar, S Pal - Sādhanā, 2018 - Springer
Users of the WWW across the globe are increasing rapidly. According to Internet live stats
there are more than 3 billion Internet users worldwide today and the number of non-English …

They Are Out There, If You Know Where to Look”: Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval

R Udupa, SK, A Bakalov, A Bhole - … on IR Research, ECIR 2009, Toulouse …, 2009 - Springer
It is well known that the use of a good Machine Transliteration system improves the retrieval
performance of Cross-Language Information Retrieval (CLIR) systems when the query and …

[PDF][PDF] Mint: A method for effective and scalable mining of named entity transliterations from large comparable corpora

R Udupa, K Saravanan, A Kumaran… - Proceedings of the …, 2009 - aclanthology.org
In this paper, we address the problem of mining transliterations of Named Entities (NEs) from
large comparable corpora. We leverage the empirical fact that multilingual news articles with …

Mining a Persian–English comparable corpus for cross-language information retrieval

HB Hashemi, A Shakery - Information Processing & Management, 2014 - Elsevier
Abstract Knowledge acquisition and bilingual terminology extraction from multilingual
corpora are challenging tasks for cross-language information retrieval. In this study, we …

[PDF][PDF] Em-based hybrid model for bilingual terminology extraction from comparable corpora

L Lee, A Aw, M Zhang, H Li - Coling 2010: Posters, 2010 - aclanthology.org
In this paper, we present an unsupervised hybrid model which combines statistical, lexical,
linguistic, contextual, and temporal features in a generic EM-based framework to harvest …

机器音译研究综述(Survey on Machine Transliteration)

Z Li, Z Wang, X Zhao - Proceedings of the 21st Chinese National …, 2022 - aclanthology.org
Abstract “机器音译是基于语音相似性自动将文本从一种语言转换为另一种语言的过程,
它是机器翻译的一个子任务, 侧重于语音信息的翻译. 音译后可知道源单词在另一种语言中的 …

Extracting bilingual terms from the Web

R Gaizauskas, ML Paramita, E Barker… - … Journal of Theoretical …, 2015 - jbe-platform.com
In this paper we make two contributions. First, we describe a multi-component system called
BiTES (Bilingual Term Extraction System) designed to automatically gather domain-specific …