Dialect-to-standard normalization: A large-scale multilingual evaluation

O Kuparinen, AM Haddad… - Conference on Empirical …, 2023 - researchportal.helsinki.fi
Text normalization methods have been commonly applied to historical language or user-
generated content, but less often to dialectal transcriptions. In this paper, we introduce …

Accessing spoken language corpora: an overview of current approaches

J Batinić, E Frick, T Schmidt - Corpora, 2021 - euppublishing.com
In this paper, we present an overview of freely available web applications providing online
access to spoken language corpora. We explore and discuss various solutions with which …

The Janes project: language resources and tools for Slovene user generated content

D Fišer, N Ljubešić, T Erjavec - Language resources and evaluation, 2020 - Springer
The paper presents the results of the Janes project, which aimed to develop language
resources and tools for Slovene user generated content. The paper first describes the 200 …

Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations

M Sanguinetti, C Bosco, L Cassidy, Ö Çetinoğlu… - Language Resources …, 2023 - Springer
This article presents a discussion on the main linguistic phenomena which cause difficulties
in the analysis of user-generated texts found on the web and in social media, and proposes …

Gesprächskorpora

T Schmidt, M Kupietz, T Schmidt - Korpuslinguistik, 2018 - degruyter.com
Dieser Beitrag setzt sich mit Gesprächskorpora als einem besonderen Typus von Korpora
gesprochener Sprache auseinander. Es werden zunächst wesentliche Eigenschaften …

[PDF][PDF] Construction and Dissemination of a Corpus of Spoken Interaction–Tools and Workflows in the FOLK project

T Schmidt - Journal for language technology and computational …, 2016 - jlcl.org
This paper is about the workflow for construction and dissemination of FOLK
(Forschungsund Lehrkorpus Gesprochenes Deutsch–Research and Teaching Corpus of …

Zur Stratifikation des FOLK-Korpus: Konzeption und Strategien

J Kaiser - Gesprächsforschung, 2018 - ids-pub.bsz-bw.de
Das Forschungs-und Lehrkorpus Gesprochenes Deutsch (FOLK), zugänglich über die
Datenbank für Gesprochenes Deutsch (DGD), strebt den Status eines Referenzkorpus für …

The universal dependencies treebank of spoken Slovenian

K Dobrovoljc, J Nivre - … of the Tenth International Conference on …, 2016 - aclanthology.org
This paper presents the construction of an open-source dependency treebank of spoken
Slovenian, the first syntactically annotated collection of spontaneous speech in Slovenian …

Spoken corpora of Slavic languages

N Dobrushina, E Sokur - Russian Linguistics, 2022 - Springer
Spoken corpora are collections of transcribed and annotated audio and/or video recordings
of languages or language varieties. The aim of this paper is to present an overview of 51 …

Annotating dialogue acts in speech data: Problematic issues and basic dialogue act categories

D Verdonik - International Journal of Corpus Linguistics, 2023 - jbe-platform.com
The aims of this paper are to detect the most problematic issues related to dialogue act
annotation in speech corpora and to define basic categories of dialogue acts. I critically …