Korpusi slovenskega jezika Gigafida, KRES, ccGigafida in ccKRES: gradnja, vsebina, uporaba

M Martinc, S Pollak, M Robnik-Šikonja - Computational Linguistics, 2021 - direct.mit.edu

We present a set of novel neural supervised and unsupervised approaches for determining
the readability of documents. In the unsupervised setting, we leverage neural language …

被引用次数：165 相关文章所有 7 个版本

The Janes project: language resources and tools for Slovene user generated content

D Fišer, N Ljubešić, T Erjavec - Language resources and evaluation, 2020 - Springer

The paper presents the results of the Janes project, which aimed to develop language
resources and tools for Slovene user generated content. The paper first describes the 200 …

被引用次数：46 相关文章所有 4 个版本

[PDF] academia.edu

[PDF][PDF] Normalising Slovene data: historical texts vs. user-generated content

N Ljubešic, K Zupan, D Fišer, T Erjavec - Proceedings of the 13th …, 2016 - academia.edu

The paper presents two manually annotated Slovene language text normalisation datasets,
one of historical texts and the other of tweets, and proposes several variants of character …

被引用次数：60 相关文章所有 8 个版本

[PDF] aclanthology.org

Gigafida 2.0: the reference corpus of written standard Slovene

S Krek, ŠA Holdt, T Erjavec, J Čibej… - Proceedings of the …, 2020 - aclanthology.org

We describe a new version of the Gigafida reference corpus of Slovene. In addition to
updating the corpus with new material and annotating it with better tools, the focus of the …

被引用次数：26 相关文章所有 6 个版本

[PDF] aclanthology.org

Introducing the CURLICAT corpora: seven-language domain specific annotated corpora from curated sources

T Váradi, B Nyéki, S Koeva, M Tadić… - Proceedings of the …, 2022 - aclanthology.org

This article presents the current outcomes of the CURLICAT CEF Telecom project, which
aims to collect and deeply annotate a set of large corpora from selected domains. The …

被引用次数：10 相关文章所有 4 个版本

Discovering automated lexicography: The case of the Slovene lexical database

P Gantar, I Kosem, S Krek - International journal of lexicography, 2016 - academic.oup.com

In this paper, we describe the compilation of the Slovene Lexical Database; main focus
being on developing the methodology to improve the tools used for lexicographic analysis …

被引用次数：33 相关文章

Distributional modelling for semantic shift detection

D Fišer, N Ljubešić - International Journal of Lexicography, 2019 - academic.oup.com

This paper gives an overview of distributional modelling of word meaning for contemporary
lexicography. We also apply it in a case study on automatic semantic shift detection in …

被引用次数：25 相关文章所有 2 个版本

[PDF] unige.ch

Modernising historical Slovene words

Y Scherrer, T Erjavec - Natural Language Engineering, 2016 - cambridge.org

We propose a language-independent word normalisation method and exemplify it on
modernising historical Slovene words. Our method relies on character-level statistical …

被引用次数：33 相关文章所有 5 个版本

[PDF] researchgate.net

[PDF][PDF] Collocations dictionary of modern Slovene

I Kosem, S Krek, P Gantar, Š Arhar Holdt… - Proceedings of the …, 2018 - researchgate.net

The paper presents the compilation of the Collocations Dictionary of Modern Slovene, a new
resource targeting the language production needs of Slovene speakers. An important aspect …

被引用次数：21 相关文章所有 5 个版本

[PDF] aclanthology.org

[PDF][PDF] Predicting the level of text standardness in user-generated content

N Ljubešić, D Fišer, T Erjavec, J Čibej… - Proceedings of the …, 2015 - aclanthology.org

Non-standard language as it appears in user-generated content has recently attracted much
attention. This paper proposes that non-standardness comes in two basic varieties, technical …

被引用次数：36 相关文章所有 10 个版本

高级搜索

QQ 群