The paper discusses several key concepts related to the development of corpora and reconsiders them in light of recent developments in NLP. On the basis of an overview of …
P Bański - Balisage: The markup conference, 2010 - balisage.net
The present submission focuses on the concept of stand-off annotation as it is implemented in the current version of the TEI Guidelines. We look at the motivation for choosing the stand …
M Ogrodniczuk, M Lenart - LREC, 2012 - researchgate.net
This paper presents a robust linguistic Web service framework for Polish, combining several mature offline linguistic tools in a common online platform. The toolset comprise paragraph …
P Bański, B Wójtowicz - … : From Storyboard to Sustainability and LR …, 2010 - lrec-conf.org
The paper presents the Open-Content Text Corpus, an open-access open-content versatile TEI-XML-encoded resource located at SourceForge and distributed under the GNU Public …
K Jassem, R Grundkiewicz - … for Computer Science and Linguistics: 6th …, 2016 - Springer
The paper describes an open-source set of linguistic tools, whose distinctive features are its customisability and compatibility with other NLP toolkits: texts in various natural languages …
This document describes the “packaged” version of the National Corpus of Polish-based TEI P5 stand-off annotation format, recently tested with the linguistic Web Service for Polish. It is …
M Ogrodniczuk, M Lenart - … of the Language Technology for a …, 2011 - researchgate.net
Numerous actions taken throughout Europe to fulfil the CLARIN mission of creating, coordinating and making language resources and technology available and readily useable …
С Коева, Д Благоева, С Колковска, Ц Димитрова… - balgarskiezik.eu
The paper offers an overview of the methodology adopted in the development of the Bulgarian National Corpus (BulNC), its structure, linguistic annotation and applications with …
1 Until recently the creation of large historical reference corpora was, from the point of view of its encoding, a rather project-specific activity. Although reference corpora were built from …