Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions

C Ramisch, S Cordeiro, A Savary, V Vincze… - Proceedings of the …, 2018 - hal.science
This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal
multi-word expressions. We present the annotation methodology, focusing on changes from …

Evaluation of transfer learning for polish with a text-to-text model

A Chrabrowa, Ł Dragan, K Grzegorczyk… - arXiv preprint arXiv …, 2022 - arxiv.org
We introduce a new benchmark for assessing the quality of text-to-text models for Polish.
The benchmark consists of diverse tasks and datasets: KLEJ benchmark adapted for text-to …

[PDF][PDF] The reference corpus of the contemporary Romanian language (CoRoLa)

VB Mititelu, D Tufiş, E Irimia - Proceedings of the Eleventh …, 2018 - aclanthology.org
We present here the largest publicly available corpus of Romanian. Its written component
contains 1,257,752,812 tokens, distributed, in an unbalanced way, in several language …

Sociolinguistics in East Central Europe

M Kontra, M Sloboda, J Nekvapil… - … Around the World, 2023 - taylorfrancis.com
Four countries in East Central Europe are discussed. For Hungary, Kontra describes urban
dialectology, the contact varieties of Hungarian, and language policy and rights in Hungary …

A Survey of Large Language Models for European Languages

W Ali, S Pyysalo - arXiv preprint arXiv:2408.15040, 2024 - arxiv.org
Large Language Models (LLMs) have gained significant attention due to their high
performance on a wide range of natural language tasks since the release of ChatGPT. The …

The design of semi-lexicality

H Klockmann - Utrecht: LOT Publications, 2017 - lotpublications.nl
The Design of Semi-lexicality Page 1 220 460 Heidi Klockmann The Design of Semi-lexicality
Semi-lexicality refers to lexical items which show both lexical and functional properties …

A deep learning model of spatial distance and named entity recognition (SD-NER) for flood mark text classification

R Szczepanek - Water, 2023 - mdpi.com
Information on historical flood levels can be communicated verbally, in documents, or in the
form of flood marks. The latter are the most useful from the point of view of public awareness …

Latvian national corpora collection–korpuss. lv

B Saulīte, R Darģis, N Gruzitis, I Auziņa… - Proceedings of the …, 2022 - aclanthology.org
LNCC is a diverse collection of Latvian language corpora representing both written and
spoken language and is useful for both linguistic research and language modelling. The …

Arguments and adjuncts in Universal Dependencies

A Przepiórkowski, A Patejuk - Proceedings of the 27th …, 2018 - aclanthology.org
The aim of this paper is to argue for a coherent Universal Dependencies approach to the
core vs. non-core distinction. We demonstrate inconsistencies in the current version 2 of UD …

Coordination of unlike grammatical cases (and unlike categories)

A Przepiórkowski - Language, 2022 - muse.jhu.edu
It is often claimed that conjuncts in coordinate structures must be alike in various ways, in
particular, that they should have the same syntactic category and the same grammatical …