Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

Findings of the VarDial evaluation campaign 2017

M Zampieri, S Malmasi, N Ljubešić… - Proceedings of the …, 2017 - aclanthology.org
We present the results of the VarDial Evaluation Campaign on Natural Language
Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part …

[PDF][PDF] Language Identification and Morphosyntactic Tagging. The Second VarDial Evaluation Campaign.

M Zampieri, S Malmasi, P Nakov, A Ali, S Shon, J Glass… - 2018 - repository.ubn.ru.nl
We present the results and the findings of the Second VarDial Evaluation Campaign on
Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects. The …

Natural language processing for similar languages, varieties, and dialects: A survey

M Zampieri, P Nakov, Y Scherrer - Natural Language Engineering, 2020 - cambridge.org
There has been a lot of recent interest in the natural language processing (NLP) community
in the computational processing of language varieties and dialects, with the aim to improve …

Language variety identification with true labels

M Zampieri, K North, T Jauhiainen, M Felice… - arXiv preprint arXiv …, 2023 - arxiv.org
Language identification is an important first step in many IR and NLP applications. Most
publicly available language identification datasets, however, are compiled under the …

Tübingen-oslo at SemEval-2018 task 2: SVMs perform better than RNNs in emoji prediction

Ç Çöltekin, T Rama - … of the 12th international workshop on …, 2018 - aclanthology.org
This paper describes our participation in the SemEval-2018 task Multilingual Emoji
Prediction. We participated in both English and Spanish subtasks, experimenting with …

HeLI-based experiments in Swiss German dialect identification

TS Jauhiainen, HA Jauhiainen… - Workshop on NLP for …, 2018 - researchportal.helsinki.fi
In this paper we present the experiments and results by the SUKI team in the German
Dialect Identification shared task of the VarDial 2018 Evaluation Campaign. Our submission …

Language discrimination and transfer learning for similar languages: Experiments with feature combinations and adaptation

N Wu, E DeMattos, KH So, P Chen… - Proceedings of the Sixth …, 2019 - aclanthology.org
This paper describes the work done by team tearsofjoy participating in the VarDial 2019
Evaluation Campaign. We developed two systems based on Support Vector Machines: SVM …

Language model adaptation for language and dialect identification of text

T Jauhiainen, K Lindén, H Jauhiainen - Natural Language …, 2019 - cambridge.org
This article describes an unsupervised language model (LM) adaptation approach that can
be used to enhance the performance of language identification methods. The approach is …

Discriminating between Mandarin Chinese and Swiss-German varieties using adaptive language models

TS Jauhiainen, HA Jauhiainen… - Workshop on NLP for …, 2019 - researchportal.helsinki.fi
This paper describes the language identification systems used by the SUKI team in the
Discriminating between the Mainland and Taiwan variation of Mandarin Chinese (DMT) and …