Bangla is spoken as the majority language in Bangladesh and the state of West Bengal in India, and as a minority language in several other Indian states. With almost 200 million …
The world-wide proliferation of digital communications has created the need for language and speech processing systems for underresourced languages. Developing such systems is …
T Erjavec - Handbook of linguistic annotation, 2017 - Springer
The chapter presents the MULTEXT-East language resources, a multilingual dataset for language engineering research, focused on the morphosyntactic level of linguistic …
J Hana, A Feldman, K Aharodnik - … of the 5th ACL-HLT Workshop …, 2011 - aclanthology.org
The paper describes a tagger for Old Czech (1200-1500 AD), a fusional language with rich morphology. The practical restrictions (no native speakers, limited corpora and lexicons …
M Dickinson - Proceedings of the 23rd International Conference …, 2010 - aclanthology.org
To speed up the process of categorizing learner errors and obtaining data for languages which lack error-annotated data, we describe a linguistically-informed method for generating …
We show that a carefully crafted probabilistic morphological analyzer significantly outperforms a reasonable, naive baseline for Syriac. Syriac is an under-resourced Semitic …
This paper describes the implementation of a resource-light approach, cross-language transfer, to build and annotate a historical corpus for Old Occitan. Our approach transfers …
No desenvolvimento de analisadores sintáticos profundos para textos irrestritos, a principal dificuldade a ser vencida é a modelação do léxico. Tradicionalmente, duas estratégias têm …
I Doval - Research in Corpus Linguistics, 2017 - ricl.aelinco.es
This paper reviews the author's experiences of tokenizing and POS tagging a bilingual parallel corpus, the PaGeS Corpus, consisting mostly of German and Spanish fictional texts …