Sequence-to-sequence pretraining for a less-resourced Slovenian language

M Ulčar, M Robnik-Šikonja - Frontiers in Artificial Intelligence, 2023 - frontiersin.org
Introduction Large pretrained language models have recently conquered the area of natural
language processing. As an alternative to predominant masked language modeling …

A Survey of Large Language Models for European Languages

W Ali, S Pyysalo - arXiv preprint arXiv:2408.15040, 2024 - arxiv.org
Large Language Models (LLMs) have gained significant attention due to their high
performance on a wide range of natural language tasks since the release of ChatGPT. The …

Semantic change detection for Slovene language: a novel dataset and an approach based on optimal transport

M Pranjić, K Dobrovoljc, S Pollak, M Martinc - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we focus on the detection of semantic changes in Slovene, a less resourced
Slavic language with two million speakers. Detecting and tracking semantic changes …

Cross-lingual transfer of abstractive summarizer to less-resource language

A Žagar, M Robnik-Šikonja - Journal of Intelligent Information Systems, 2022 - Springer
Automatic text summarization extracts important information from texts and presents the
information in the form of a summary. Abstractive summarization approaches progressed …

The anatomy of specialized knowledge: Comparing experts and non-experts through associations, frames and language models

Š Vintar, A Saksida - Lexicographica, 2023 - degruyter.com
We explore specialized knowledge and aim to show that expert conceptual spaces differ
from those of non-experts. This rather broad research question is addressed from different …

[PDF][PDF] A Method for Selection of Phonetically Balanced Sentences in Read Speech Corpus Design

JZ Gros, B Vesnicer, S Dobrisek - Proceedings of the 30th European …, 2022 - eurasip.org
Sentence selection for speech prompts plays an important role in the process of designing a
speech corpus of read speech, both for speech recognition and speech synthesis. The …

Extending the SSJ Universal Dependencies Treebank for Slovenian: Was it Worth it?

K Dobrovoljc, N Ljubešić - Proceedings of the 16th Linguistic …, 2022 - aclanthology.org
This paper presents the creation and evaluation of a new version of the reference SSJ
Universal Dependencies Treebank for Slovenian, which has been substantially improved …

Collocation ranking: frequency vs semantics

N Ljubešić, N Logar, I Kosem - Slovenščina 2.0: empirične …, 2021 - journals.uni-lj.si
Collocations play a very important role in language description, especially in identifying
meanings of words. Modern lexicography's inevitable part of meaning deduction are lists of …

Corpus-Linguistic Analysis of Speech Communities on Anti-Gender Discourse in Slovene

D Popič, V Gorjanc - Gender a výzkum/Gender and Research, 2023 - ceeol.com
his paper deals with a corpus-linguistic analysis of different text/media types in Slovene with
the aim of finding out whether or not any of the communication channels covered by the …

Data preparation in crowdsourcing for pedagogical purposes: the case of the CrowLL game

TZ Kuhn, ŠA Holdt, I Kosem, C Tiberius… - Slovenščina 2.0 …, 2022 - journals.uni-lj.si
One way to stimulate the use of corpora in language education is by making pedagogically
appropriate corpora, labeled with different types of problems (sensitive content, offensive …