A generalized method for automated multilingual loanword detection

A Nath, SM Saravani, I Khebour… - Proceedings of the …, 2022 - aclanthology.org
Loanwords are words incorporated from one language into another without translation.
Suppose two words from distantly-related or unrelated languages sound similar and have a …

Borrowing or codeswitching? Annotating for finer-grained distinctions in language mixing

EA Mellado, C Lignos - arXiv preprint arXiv:2206.04973, 2022 - arxiv.org
We present a new corpus of Twitter data annotated for codeswitching and borrowing
between Spanish and English. The corpus contains 9,500 tweets annotated at the token …

Characterizing Spans for Sequence Labeling: A Case on Anglicism Detection

EÁ Mellado, J Gonzalo - Procesamiento del lenguaje natural, 2024 - journal.sepln.org
We propose a set of formal dimensions to characterize spans in sequence labeling
evaluation. We apply them to a dataset and model results obtained for anglicism detection in …

Automatic Detection of Borrowings in Low-Resource Languages of the Caucasus: Andic branch

K Zaitsev, A Minchenko - Proceedings of the first workshop on …, 2022 - aclanthology.org
Linguistic borrowings occur in all languages. Andic languages of the Caucasus have
borrowings from different donor-languages like Russian, Arabic, Persian. To automatically …

Analysis of Pre-trained Language Models in Text Classification for Use in Spanish Medical Records Anonymization

DJM Acosta, JDP Aguilar - 2023 IEEE Colombian Caribbean …, 2023 - ieeexplore.ieee.org
In order to facilitate the utilization of healthcare data for research purposes while ensuring
patient privacy, the removal of personal health information (PHI) is imperative. This process …