Enriching word vectors with subword information

P Bojanowski, E Grave, A Joulin… - Transactions of the …, 2017 - direct.mit.edu
Continuous word representations, trained on large unlabeled corpora are useful for many
natural language processing tasks. Popular models that learn such representations ignore …

A family of pretrained transformer language models for Russian

D Zmitrovich, A Abramov, A Kalmykov… - arXiv preprint arXiv …, 2023 - arxiv.org
Transformer language models (LMs) are fundamental to NLP research methodologies and
applications in various languages. However, developing such models specifically for the …

A survey of semantic relatedness evaluation datasets and procedures

MA Hadj Taieb, T Zesch, M Ben Aouicha - Artificial Intelligence Review, 2020 - Springer
Semantic relatedness between words is a core concept in natural language processing.
While countless approaches have been proposed, measuring which one works best is still a …

Big BiRD: A large, fine-grained, bigram relatedness dataset for examining semantic composition

S Asaadi, S Mohammad… - Proceedings of the 2019 …, 2019 - aclanthology.org
Bigrams (two-word sequences) hold a special place in semantic composition research since
they are the smallest unit formed by composing words. A semantic relatedness dataset that …

RUSSE'2018: a shared task on word sense induction for the Russian language

A Panchenko, A Lopukhina, D Ustalov… - arXiv preprint arXiv …, 2018 - arxiv.org
The paper describes the results of the first shared task on word sense induction (WSI) for the
Russian language. While similar shared tasks were conducted in the past for some …

subs2vec: Word embeddings from subtitles in 55 languages

J Van Paridon, B Thompson - Behavior research methods, 2021 - Springer
This paper introduces a novel collection of word embeddings, numerical representations of
lexical semantics, in 55 languages, trained on a large corpus of pseudo-conversational …

Watset: Automatic induction of synsets from a graph of synonyms

D Ustalov, A Panchenko, C Biemann - arXiv preprint arXiv:1704.07157, 2017 - arxiv.org
This paper presents a new graph-based approach that induces synsets using synonymy
dictionaries and word embeddings. First, we build a weighted graph of synonyms extracted …

Frequency tagging of syntactic structure or lexical properties; a registered MEG study

E Kalenkovich, A Shestakova, N Kazanina - Cortex, 2022 - Elsevier
A traditional view on sentence comprehension holds that the listener parses linguistic input
using hierarchical syntactic rules. Recently, physiological evidence for such a claim has …

Learning to generate word representations using subword information

Y Kim, KM Kim, JM Lee, SK Lee - Proceedings of the 27th …, 2018 - aclanthology.org
Distributed representations of words play a major role in the field of natural language
processing by encoding semantic and syntactic information of words. However, most …

RUSSE'2020: Findings of the First Taxonomy Enrichment Task for the Russian language

I Nikishina, V Logacheva, A Panchenko… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper describes the results of the first shared task on taxonomy enrichment for the
Russian language. The participants were asked to extend an existing taxonomy with …