An isotropy analysis in the multilingual BERT embedding space

S Rajaee, MT Pilehvar - arXiv preprint arXiv:2110.04504, 2021 - arxiv.org
Several studies have explored various advantages of multilingual pre-trained models (such
as multilingual BERT) in capturing shared linguistic knowledge. However, less attention has …

[PDF][PDF] Normalization of language embeddings for cross-lingual alignment

PO Aboagye, Y Zheng, CCM Yeh, J Wang… - International …, 2022 - par.nsf.gov
Learning a good transfer function to map the word vectors from two languages into a shared
cross-lingual word vector space plays a crucial role in cross-lingual NLP. It is useful in …

Alignment of Multilingual Embeddings to Estimate Job Similarities in Online Labour Market

S D'Amico, L Malandri, F Mercorio… - 2024 IEEE 11th …, 2024 - ieeexplore.ieee.org
In recent years, word embeddings (WEs) have proven relevant for studying differences and
similarities among job professions and skills required by the labour market across countries …

Por Qu\'e N\~ ao Utiliser Alla Spr {\aa} k? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer

H Xu, K Murray - arXiv preprint arXiv:2204.13869, 2022 - arxiv.org
The current state-of-the-art for few-shot cross-lingual transfer learning first trains on
abundant labeled data in the source language and then fine-tunes with a few examples on …

Unsupervised geometric and topological approaches for cross-lingual sentence representation and comparison

SH Meirom, O Bobrowski - 2022 - qmro.qmul.ac.uk
We propose novel structural-based approaches for the generation and comparison of cross
lingual sentence representations. We do so by applying geometric and topological methods …

Voices in a Crowd: Searching for Clusters of Unique Perspectives

N Vitsakis, A Parekh, I Konstas - arXiv preprint arXiv:2407.14259, 2024 - arxiv.org
Language models have been shown to reproduce underlying biases existing in their training
data, which is the majority perspective by default. Proposed solutions aim to capture minority …

SeNSe: embedding alignment via semantic anchors selection

L Malandri, F Mercorio, M Mezzanzanica… - International Journal of …, 2024 - Springer
Word embeddings have proven extremely useful across many NLP applications in recent
years. Several key linguistic tasks, such as machine translation and transfer learning …

Investigating the Effectiveness of Whitening Post-processing Methods on Modifying LLMs Representations

Z Wang, Y Wu - 2023 IEEE 35th International Conference on …, 2023 - ieeexplore.ieee.org
In contemporary natural language processing (NLP) tasks, it is common to utilize the
representation of large language models (LLMs) directly in downstream applications …

Quantifying domain knowledge in large language models

S Sayenju, R Aygun, B Franks… - … IEEE Conference on …, 2023 - ieeexplore.ieee.org
Transformer based Large language models such as BERT, have demonstrated the ability to
derive contextual information from the words surrounding it. However, when these models …

Induction of Bilingual Dictionaries

S Sharoff, R Rapp, P Zweigenbaum - Building and Using Comparable …, 2023 - Springer
The aim of the Bilingual Lexicon Induction (BLI) task is to produce a bilingual lexicon using a
pair of comparable corpora and either a small set of seed translations (a supervised setting) …