The TEI and current standards for structuring linguistic data. An overview

M Stührenberg - Journal of the text encoding initiative, 2012 - journals.openedition.org
The TEI has served for many years as a mature annotation format for corpora of different
types, including linguistically annotated data. Although it is based on the consensus of a …

The Bulgarian National Corpus: Theory and practice in corpus design

S Koeva, I Stoyanova, S Leseva… - Journal of Language …, 2012 - jlm.ipipan.waw.pl
The paper discusses several key concepts related to the development of corpora and
reconsiders them in light of recent developments in NLP. On the basis of an overview of …

[HTML][HTML] Why TEI stand-off annotation doesn't quite work

P Bański - Balisage: The markup conference, 2010 - balisage.net
The present submission focuses on the concept of stand-off annotation as it is implemented
in the current version of the TEI Guidelines. We look at the motivation for choosing the stand …

[PDF][PDF] Web Service integration platform for Polish linguistic resources.

M Ogrodniczuk, M Lenart - LREC, 2012 - researchgate.net
This paper presents a robust linguistic Web service framework for Polish, combining several
mature offline linguistic tools in a common online platform. The toolset comprise paragraph …

[PDF][PDF] The Open-Content Text Corpus project

P Bański, B Wójtowicz - … : From Storyboard to Sustainability and LR …, 2010 - lrec-conf.org
The paper presents the Open-Content Text Corpus, an open-access open-content versatile
TEI-XML-encoded resource located at SourceForge and distributed under the GNU Public …

An Example of a Compatible NLP Toolkit

K Jassem, R Grundkiewicz - … for Computer Science and Linguistics: 6th …, 2016 - Springer
The paper describes an open-source set of linguistic tools, whose distinctive features are its
customisability and compatibility with other NLP toolkits: texts in various natural languages …

[PDF][PDF] The Packaged TEI P5-based Stand-off Annotation Format

M Ogrodniczuk - 2011 - nlp.ipipan.waw.pl
This document describes the “packaged” version of the National Corpus of Polish-based TEI
P5 stand-off annotation format, recently tested with the linguistic Web Service for Polish. It is …

Multipurpose Linguistic Web Service for Polish

M Ogrodniczuk, M Lenart - … of the Language Technology for a …, 2011 - researchgate.net
Numerous actions taken throughout Europe to fulfil the CLARIN mission of creating,
coordinating and making language resources and technology available and readily useable …

[PDF][PDF] Българският национален корпус в контекста на съвременната лингвистика

С Коева, Д Благоева, С Колковска, Ц Димитрова… - balgarskiezik.eu
The paper offers an overview of the methodology adopted in the development of the
Bulgarian National Corpus (BulNC), its structure, linguistic annotation and applications with …

[PDF][PDF] Journal of the Text Encoding Initiative

S Haaf, A Geyken, F Wiegand - researchgate.net
1 Until recently the creation of large historical reference corpora was, from the point of view
of its encoding, a rather project-specific activity. Although reference corpora were built from …