WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models

B Minixhofer, F Paischer, N Rekabsaz - arXiv preprint arXiv:2112.06598, 2021 - arxiv.org
Large pretrained language models (LMs) have become the central building block of many
NLP applications. Training these models requires ever more computational resources and …

Deep encoding of etymological information in TEI

J Bowers, L Romary - Journal of the Text Encoding …, 2016 - journals.openedition.org
In this paper we provide a systematic and comprehensive set of modeling principles for
representing etymological data in digital dictionaries using TEI. The purpose is to integrate …

Creating lexical resources in tei p5. a schema for multi-purpose digital dictionaries

G Budin, S Majewski, K Mörth - Journal of the Text …, 2012 - journals.openedition.org
Although most of the relevant dictionary productions of the recent past have relied on digital
data and methods, there is little consensus on formats and standards. The Institute for …

[PDF][PDF] Laying the foundations for a diachronic dictionary of tunis arabic. a first glance at an evolving new language resource

K Moerth, S Procházka, I Dallaji - Proceedings of the XVI EURALEX …, 2014 - euralex.org
Arabic lexicography has a long tradition. However, at the time of writing this report, there
exist only a very few digital products, let alone products documenting Arabic dialects. Our …

Inducing discourse marker inventories from lexical knowledge graphs

C Chiarcos - Proceedings of the Thirteenth Language Resources …, 2022 - aclanthology.org
Discourse marker inventories are important tools for the development of both discourse
parsers and corpora with discourse annotations. In this paper we explore the potential of …

Modelling frequency data--Methodological considerations on the relationship between dictionaries and corpora

G Budin, K Mörth, L Romary - TEI Conference 2013, 2013 - inria.hal.science
The research questions addressed in our paper stem from a bundle of linguistically focused
projects which-among other activities-also create glossaries and dictionaries which are …

Modeling Frequency Data: Methodological Considerations on the Relationship between Dictionaries and Corpora

K Mörth, L Romary, G Budin… - Journal of the Text …, 2014 - journals.openedition.org
Academic dictionary writing is making greater and greater use of the TEI Guidelines'
dictionary module. And as increasing numbers of TEI dictionaries become available, there is …

[PDF][PDF] Journal of the Text Encoding Initiative

G Budin, S Majewski, K Mörth - methods, 2000 - researchgate.net
1. Background 1 Lexicography, the art of compiling dictionaries, is one of the oldest
branches of linguistics. All remnants of early lexicographic writings stem from Asia, and the …