Supervised and unsupervised neural approaches to text readability

M Martinc, S Pollak, M Robnik-Šikonja - Computational Linguistics, 2021 - direct.mit.edu
We present a set of novel neural supervised and unsupervised approaches for determining
the readability of documents. In the unsupervised setting, we leverage neural language …

The Janes project: language resources and tools for Slovene user generated content

D Fišer, N Ljubešić, T Erjavec - Language resources and evaluation, 2020 - Springer
The paper presents the results of the Janes project, which aimed to develop language
resources and tools for Slovene user generated content. The paper first describes the 200 …

[PDF][PDF] Normalising Slovene data: historical texts vs. user-generated content

N Ljubešic, K Zupan, D Fišer, T Erjavec - Proceedings of the 13th …, 2016 - academia.edu
The paper presents two manually annotated Slovene language text normalisation datasets,
one of historical texts and the other of tweets, and proposes several variants of character …

Gigafida 2.0: the reference corpus of written standard Slovene

S Krek, ŠA Holdt, T Erjavec, J Čibej… - Proceedings of the …, 2020 - aclanthology.org
We describe a new version of the Gigafida reference corpus of Slovene. In addition to
updating the corpus with new material and annotating it with better tools, the focus of the …

Introducing the CURLICAT corpora: seven-language domain specific annotated corpora from curated sources

T Váradi, B Nyéki, S Koeva, M Tadić… - Proceedings of the …, 2022 - aclanthology.org
This article presents the current outcomes of the CURLICAT CEF Telecom project, which
aims to collect and deeply annotate a set of large corpora from selected domains. The …

Discovering automated lexicography: The case of the Slovene lexical database

P Gantar, I Kosem, S Krek - International journal of lexicography, 2016 - academic.oup.com
In this paper, we describe the compilation of the Slovene Lexical Database; main focus
being on developing the methodology to improve the tools used for lexicographic analysis …

Distributional modelling for semantic shift detection

D Fišer, N Ljubešić - International Journal of Lexicography, 2019 - academic.oup.com
This paper gives an overview of distributional modelling of word meaning for contemporary
lexicography. We also apply it in a case study on automatic semantic shift detection in …

Modernising historical Slovene words

Y Scherrer, T Erjavec - Natural Language Engineering, 2016 - cambridge.org
We propose a language-independent word normalisation method and exemplify it on
modernising historical Slovene words. Our method relies on character-level statistical …

[PDF][PDF] Collocations dictionary of modern Slovene

I Kosem, S Krek, P Gantar, Š Arhar Holdt… - Proceedings of the …, 2018 - researchgate.net
The paper presents the compilation of the Collocations Dictionary of Modern Slovene, a new
resource targeting the language production needs of Slovene speakers. An important aspect …

[PDF][PDF] Predicting the level of text standardness in user-generated content

N Ljubešić, D Fišer, T Erjavec, J Čibej… - Proceedings of the …, 2015 - aclanthology.org
Non-standard language as it appears in user-generated content has recently attracted much
attention. This paper proposes that non-standardness comes in two basic varieties, technical …