Multilingual sentence categorization according to language

GI Kikui - COLING 1996 Volume 2: The 16th International …, 1996 - aclanthology.org

This paper proposes a new algorithm that simultaneously identifies the coding system and
language of a code string fetched from the Internet, especially World-Wide Web. The …

被引用次数：205 相关文章所有 4 个版本

[PDF] researchgate.net

[PDF][PDF] Language identification from text using n-gram based cumulative frequency addition

B Ahmed, SH Cha, C Tappert - Proceedings of Student/Faculty …, 2004 - researchgate.net

This paper describes the preliminary results of an efficient language classifier using an ad-
hoc Cumulative Frequency Addition of N-grams. The new classification technique is simpler …

被引用次数：107 相关文章所有 3 个版本

[PDF] uohyd.ac.in

Language identification from small text samples

KN Murthy, GB Kumar - Journal of Quantitative Linguistics, 2006 - Taylor & Francis

There is an increasing need to deal with multi-lingual documents today. If we could segment
multi-lingual documents language-wise, it would be very useful both for exploration of …

被引用次数：78 相关文章所有 7 个版本

[PDF] usm.my

[PDF][PDF] Automatic identification of close languages-case study: Malay and Indonesian

B Ranaivo-Malançon - ECTI Transactions on Computer and …, 2006 - eprints.usm.my

Identifying the language of an unknown text is not a new problem but what is new is the task
of identifying close languages. Malay and Indonesian as many other language€ are very …

被引用次数：65 相关文章所有 4 个版本

Hypertextsorten: Definition, Struktur, Klassifikation

G Rehm - 2005 - jlupub.ub.uni-giessen.de

Suchmaschinen im WWW indexieren und durchsuchen Dokumente in großer
Geschwindigkeit. Trotz der quantitativ beeindruckenden Ergebnisse lässt dieQualität der …

被引用次数：61 相关文章所有 2 个版本

[PDF] aclanthology.org

[PDF][PDF] Study of some distance measures for language and encoding identification

AK Singh - Proceedings of the Workshop on Linguistic Distances, 2006 - aclanthology.org

To determine how close two language models (eg, n-grams models) are, we can use
several distance measures. If we can represent the models as distributions, then the …

被引用次数：52 相关文章所有 11 个版本

[PDF] psu.edu

[PDF][PDF] Identification of languages and encodings in a multilingual document

AK Singh, J Gorla - Cahiers du Cental, 2007 - Citeseer

Text on the Web is available in numerous languages and encodings, often not according to
any standards. The number of multilingual documents on the Web is also increasing. The …

被引用次数：36 相关文章所有 4 个版本

[PDF] helsinki.fi

Language set identification in noisy synthetic multilingual documents

T Jauhiainen, K Lindén, H Jauhiainen - … 2015, Cairo, Egypt, April 14-20 …, 2015 - Springer

In this paper, we reconsider the problem of language identification of multilingual
documents. Automated language identification algorithms have been improving steadily …

被引用次数：20 相关文章所有 8 个版本

[PDF] hal.science

[PDF][PDF] Categorization according to language: A step toward combining linguistic knowledge and statistic learning

E Giguet - International Workshop of Parsing Technologies (IWPT' …, 1995 - hal.science

In this article, we address the problem of categorization according to language by presenting
a method based on natural properties of language which allow us to categorize any kind of …

被引用次数：32 相关文章所有 6 个版本

[PDF] hal.science

Daniel at the FinSBD-2 task: Extracting Lists and Sentences from PDF Documents: a model-driven end-to-end approach to PDF document analysis

E Giguet, G Lejeune - Second Workshop on Financial Technology and …, 2021 - hal.science

In this paper, we present the method we have designed and implemented for identifying lists
and sentences in PDF documents while participating to FinSBD-2 Financial Document …

被引用次数：6 相关文章所有 8 个版本

高级搜索

QQ 群