查看文章

arxiv.org 中的 [PDF]

XLM-T: Multilingual language models in twitter for sentiment analysis and beyond

作者

Francesco Barbieri, Luis Espinosa Anke, Jose Camacho-Collados

发表日期

2022/6

研讨会论文

Proceedings of the Thirteenth Language Resources and Evaluation Conference

页码范围

258-266

简介

Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention. However, current analyses have almost exclusively focused on (multilingual variants of) standard benchmarks, and have relied on clean pre-training and task-specific corpora as multilingual signals. In this paper, we introduce XLM-T, a model to train and evaluate multilingual language models in Twitter. In this paper we provide: (1) a new strong multilingual baseline consisting of an XLM-R (Conneau et al. 2020) model pre-trained on millions of tweets in over thirty languages, alongside starter code to subsequently fine-tune on a target task; and (2) a set of unified sentiment analysis Twitter datasets in eight different languages and a XLM-T model fine-tuned on them.

引用总数

被引用次数：268

202120222023202410 47 141 68

学术搜索中的文章

XLM-T: Multilingual language models in Twitter for sentiment analysis and beyond

F Barbieri, LE Anke, J Camacho-Collados - arXiv preprint arXiv:2104.12250, 2021

被引用次数：199 相关文章所有 6 个版本

Xlm-t: A multilingual language model toolkit for twitter*

F Barbieri, LE Anke, J Camacho-Collados - arXiv preprint arXiv:2104.12250, 2021

被引用次数：70 相关文章