MULTEXT-East: morphosyntactic resources for Central and Eastern European languages

T Erjavec - Language resources and evaluation, 2012 - Springer
The paper presents the MULTEXT-East language resources, a multilingual dataset for
language engineering research, focused on the morphosyntactic level of linguistic …

[图书][B] Descriptive grammar of Bangla

AB David - 2015 - books.google.com
Bangla is spoken as the majority language in Bangladesh and the state of West Bengal in
India, and as a minority language in several other Indian states. With almost 200 million …

[PDF][PDF] Using Resource-Rich Languages to Improve Morphological Analysis of Under-Resourced Languages.

P Baumann, JB Pierrehumbert - LREC, 2014 - phon.ox.ac.uk
The world-wide proliferation of digital communications has created the need for language
and speech processing systems for underresourced languages. Developing such systems is …

Multext-east

T Erjavec - Handbook of linguistic annotation, 2017 - Springer
The chapter presents the MULTEXT-East language resources, a multilingual dataset for
language engineering research, focused on the morphosyntactic level of linguistic …

[PDF][PDF] A low-budget tagger for Old Czech

J Hana, A Feldman, K Aharodnik - … of the 5th ACL-HLT Workshop …, 2011 - aclanthology.org
The paper describes a tagger for Old Czech (1200-1500 AD), a fusional language with rich
morphology. The practical restrictions (no native speakers, limited corpora and lexicons …

[PDF][PDF] Generating learner-like morphological errors in Russian

M Dickinson - Proceedings of the 23rd International Conference …, 2010 - aclanthology.org
To speed up the process of categorizing learner errors and obtaining data for languages
which lack error-annotated data, we describe a linguistically-informed method for generating …

[图书][B] A probabilistic morphological analyzer for Syriac

PJ McClanahan - 2010 - search.proquest.com
We show that a carefully crafted probabilistic morphological analyzer significantly
outperforms a reasonable, naive baseline for Syriac. Syriac is an under-resourced Semitic …

[PDF][PDF] Building an old Occitan corpus via cross-Language transfer.

O Scrivner, S Kübler - KONVENS, 2012 - academia.edu
This paper describes the implementation of a resource-light approach, cross-language
transfer, to build and annotate a historical corpus for Old Occitan. Our approach transfers …

Utilização de informações lexicais extraídas automaticamente de corpora na análise sintática computacional do português

LFA Araripe - 2011 - repositorio.ufc.br
No desenvolvimento de analisadores sintáticos profundos para textos irrestritos, a principal
dificuldade a ser vencida é a modelação do léxico. Tradicionalmente, duas estratégias têm …

POS-tagging a bilingual parallel corpus: methods and challenges

I Doval - Research in Corpus Linguistics, 2017 - ricl.aelinco.es
This paper reviews the author's experiences of tokenizing and POS tagging a bilingual
parallel corpus, the PaGeS Corpus, consisting mostly of German and Spanish fictional texts …