Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP

SJ Mielke, Z Alyafeai, E Salesky, C Raffel… - arXiv preprint arXiv …, 2021 - arxiv.org
What are the units of text that we want to model? From bytes to multi-word expressions, text
can be analyzed and generated at many granularities. Until recently, most natural language …

When linguistics meets web technologies. Recent advances in modelling linguistic linked data

AF Khan, C Chiarcos, T Declerck, D Gifu… - Semantic …, 2022 - content.iospress.com
When linguistics meets web technologies. Recent advances in modelling linguistic linked data
- IOS Press You are viewing a javascript disabled version of the site. Please enable Javascript …

FoLiA: A practical XML format for linguistic annotation–a descriptive and comparative study

M van Gompel, M Reynaert - Computational Linguistics in the …, 2013 - clinjournal.org
In this paper we present FoLiA, a Format for Linguistic Annotation, and conduct a
comparative study with other annotation schemes, including the Linguistic Annotation …

The TEI and current standards for structuring linguistic data. An overview

M Stührenberg - Journal of the text encoding initiative, 2012 - journals.openedition.org
The TEI has served for many years as a mature annotation format for corpora of different
types, including linguistically annotated data. Although it is based on the consensus of a …

[PDF][PDF] KAF: a generic semantic annotation format

W Bosma, P Vossen, A Soroa, G Rigau… - Proceedings of the …, 2009 - academia.edu
We present KAF, the KYOTO Annotation Format. KAF is a layered and extendible linguistic
annotation format that is specifically developed to arrive at semantic interoperability. KAF is …

Models to represent linguistic linked data

J Bosque-Gil, J Gracia, E Montiel-Ponsoda… - Natural Language …, 2018 - cambridge.org
As the interest of the Semantic Web and computational linguistics communities in linguistic
linked data (LLD) keeps increasing and the number of contributions that dwell on LLD …

[PDF][PDF] Some Fine Points of Hybrid Natural Language Parsing.

P Adolphs, S Oepen, U Callmeier, B Crysmann… - LREC, 2008 - pages.cs.brandeis.edu
Large-scale grammar-based parsing systems nowadays increasingly rely on independently
developed, more specialized components for pre-processing their input. However, different …

[PDF][PDF] Conll-ul: Universal morphological lattices for universal dependency parsing

A More, Ö Çetinoğlu, Ç Çöltekin… - Proceedings of the …, 2018 - aclanthology.org
Following the development of the universal dependencies (UD) framework and the CoNLL
2017 Shared Task on end-to-end UD parsing, we address the need for a universal …

[PDF][PDF] KTimeML: specification of temporal and event expressions in Korean text

S Im, H You, H Jang, S Nam, H Shin - Proceedings of the 7th …, 2009 - aclanthology.org
Abstract TimeML, TimeBank, and TTK (TARSQI Project) have been playing an important role
in enhancement of IE, QA, and other NLP applications. TimeML is a specification language …

Integrating deep and shallow natural language processing components: representations and hybrid architectures

U Schäfer - 2006 - publikationen.sulb.uni-saarland.de
We describe basic concepts and software architectures for the integration of shallow and
deep (linguistics-based, semantics-oriented) natural language processing (NLP) …