SoMeWeTa: A part-of-speech tagger for German social media and web texts

Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations

M Sanguinetti, C Bosco, L Cassidy, Ö Çetinoğlu… - Language Resources …, 2023 - Springer

This article presents a discussion on the main linguistic phenomena which cause difficulties
in the analysis of user-generated texts found on the web and in social media, and proposes …

被引用次数：27 相关文章所有 18 个版本

[PDF] uni-saarland.de

[PDF][PDF] Evaluating Off-the-Shelf NLP Tools for German.

K Ortmann, A Roussel, S Dipper - KONVENS, 2019 - sfb1102.uni-saarland.de

It is not always easy to keep track of what tools are currently available for a particular
annotation task, nor is it obvious how the provided models will perform on a given data set …

被引用次数：29 相关文章所有 6 个版本

[PDF] mdpi.com

Iterative named entity recognition with conditional random fields

A Alves-Pinto, C Demus, M Spranger, D Labudde… - Applied Sciences, 2021 - mdpi.com

Named entity recognition (NER) constitutes an important step in the processing of
unstructured text content for the extraction of information as well as for the computer …

被引用次数：9 相关文章所有 7 个版本

[PDF] unica.it

Treebanking user-generated content: A proposal for a unified representation in Universal Dependencies

M Sanguinetti, B Cristina, C Lauren, C Ozlem… - Proceedings of the 12th …, 2020 - iris.unica.it

The paper presents a discussion on the main linguistic phenomena of user-generated texts
found in web and social media, and proposes a set of annotation guidelines for their …

被引用次数：19 相关文章所有 19 个版本

[PDF] hal.science

A corpus of German political speeches from the 21st century

A Barbaresi - 11th Language Resources and Evaluation Conference …, 2018 - hal.science

The present German political speeches corpus follows from a initial release which has been
used in various research contexts. This article documents an updated and extended version …

被引用次数：26 相关文章所有 6 个版本

[PDF] arxiv.org

Assessing emoji use in modern text processing tools

AAM Shoeb, G De Melo - arXiv preprint arXiv:2101.00430, 2021 - arxiv.org

Emojis have become ubiquitous in digital communication, due to their visual appeal as well
as their ability to vividly convey human emotion, among other factors. The growing …

被引用次数：13 相关文章所有 12 个版本

[PDF] sdu.dk

An annotated social media corpus for German

E Bick - 12th Language Resources and Evaluation …, 2020 - portal.findresearcher.sdu.dk

This paper presents the German Twitter section of a large (2 billion word) bilingual Social
Media corpus for Hate Speech research, discussing the compilation, pseudonymization and …

被引用次数：14 相关文章所有 4 个版本

[PDF] usp.br

[PDF][PDF] Etiquetagem morfossintática multigênero para o português do Brasil segundo o modelo" Universal Dependencies"

EH Silva, TAS Pardo, NT Roman - Anais, 2023 - repositorio.usp.br

Part of speech tagging is a process that seeks to identify the grammatical classes of words
and symbols (tokens) in a sentence. For Brazilian Portuguese, there is a variety of …

被引用次数：3 相关文章所有 6 个版本

[PDF] dlr.de

Building type classification from social media texts via geo-spatial textmining

M Häberle, M Werner, XX Zhu - IGARSS 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

In this work, we present a model for building type classification from Twitter text messages
(tweets) by employing geo-spatial textmining methods. First, we apply standard text pre …

被引用次数：14 相关文章所有 4 个版本

[PDF] aclanthology.org

A corpus of German Reddit exchanges (GeRedE)

A Blombach, N Dykes, P Heinrich… - Proceedings of the …, 2020 - aclanthology.org

GeRedE is a 270 million token German CMC corpus containing approximately 380,000
submissions and 6,800,000 comments posted on Reddit between 2010 and 2018. Reddit is …

被引用次数：9 相关文章所有 7 个版本

高级搜索

QQ 群