Probing pretrained language models for lexical semantics

I Vulić, EM Ponti, R Litschko, G Glavaš… - Proceedings of the …, 2020 - aclanthology.org
The success of large pretrained language models (LMs) such as BERT and RoBERTa has
sparked interest in probing their representations, in order to unveil what types of knowledge …

IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP

F Koto, A Rahimi, JH Lau, T Baldwin - arXiv preprint arXiv:2011.00677, 2020 - arxiv.org
Although the Indonesian language is spoken by almost 200 million people and the 10th
most spoken language in the world, it is under-represented in NLP research. Previous work …

Emerging cross-lingual structure in pretrained language models

S Wu, A Conneau, H Li, L Zettlemoyer… - arXiv preprint arXiv …, 2019 - arxiv.org
We study the problem of multilingual masked language modeling, ie the training of a single
model on concatenated text from multiple languages, and present a detailed study of several …

Participatory research for low-resourced machine translation: A case study in african languages

W Nekoto, V Marivate, T Matsila, T Fasubaa… - arXiv preprint arXiv …, 2020 - arxiv.org
Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to
low-resourced languages has not yet been adequately solved." Low-resourced"-ness is a …

The NLP cookbook: modern recipes for transformer based deep learning architectures

S Singh, A Mahmood - IEEE Access, 2021 - ieeexplore.ieee.org
In recent years, Natural Language Processing (NLP) models have achieved phenomenal
success in linguistic and semantic tasks like text classification, machine translation, cognitive …

Mitigating language-dependent ethnic bias in BERT

J Ahn, A Oh - arXiv preprint arXiv:2109.05704, 2021 - arxiv.org
BERT and other large-scale language models (LMs) contain gender and racial bias. They
also exhibit other dimensions of social bias, most of which have not been studied in depth …

Automatic classification of sexism in social networks: An empirical study on twitter data

F Rodríguez-Sánchez, J Carrillo-de-Albornoz… - IEEE …, 2020 - ieeexplore.ieee.org
During the last decade, hateful and sexist content towards women is being increasingly
spread on social networks. The exposure to sexist speech has serious consequences to …

Multilingual alignment of contextual word representations

S Cao, N Kitaev, D Klein - arXiv preprint arXiv:2002.03518, 2020 - arxiv.org
We propose procedures for evaluating and strengthening contextual embedding alignment
and show that they are useful in analyzing and improving multilingual BERT. In particular …

Cosda-ml: Multi-lingual code-switching data augmentation for zero-shot cross-lingual nlp

L Qin, M Ni, Y Zhang, W Che - arXiv preprint arXiv:2006.06402, 2020 - arxiv.org
Multi-lingual contextualized embeddings, such as multilingual-BERT (mBERT), have shown
success in a variety of zero-shot cross-lingual tasks. However, these models are limited by …

A survey of syntactic-semantic parsing based on constituent and dependency structures

MS Zhang - Science China Technological Sciences, 2020 - Springer
Syntactic and semantic parsing has been investigated for decades, which is one primary
topic in the natural language processing community. This article aims for a brief survey on …