Portuguese word embeddings: Evaluating on word analogies and natural language tasks

N Hartmann, E Fonseca, C Shulby, M Treviso… - arXiv preprint arXiv …, 2017 - arxiv.org
Word embeddings have been found to provide meaningful representations for words in an
efficient way; therefore, they have become common in Natural Language Processing sys …

Unsupervised multilingual sentence boundary detection

T Kiss, J Strunk - Computational linguistics, 2006 - direct.mit.edu
In this article, we present a language-independent, unsupervised approach to sentence
boundary detection. It is based on the assumption that a large number of ambiguities in the …

[PDF][PDF] O que é e como se constrói um corpus? Lições aprendidas na compilação devários corpora para pesquisa lingüística

SM Aluísio, GM de Barcellos Almeida - Calidoscópio, 2006 - redalyc.org
As pesquisas baseadas em corpus têm tido na última década um amplo desenvolvimento
no contexto brasileiro. Nota-se a sua relevância e pertinência nos domínios da Lingüística …

[PDF][PDF] Development of the multilingual semantic annotation system

SSL Piao, F Bianchi, C Dayrell… - Proceedings of the …, 2015 - aclanthology.org
This paper reports on our research to generate multilingual semantic lexical resources and
develop multilingual semantic annotation software, which assigns each word in running text …

[PDF][PDF] Evaluating Solutions for the Rapid Development of State-of-the-Art POS Taggers for Portuguese.

A Branco, JR Silva - LREC, 2004 - portulanclarin.net
We report on solutions we adopted for the specific issues that arise when developing new
automatic taggers for Portuguese, solutions whose design is general enough, we believe, to …

An analysis of sentence boundary detection systems for English and Portuguese documents

CN Silla Jr, CAA Kaestner - … Conference on Intelligent Text Processing and …, 2004 - Springer
In this paper we present a study comparing the performance of different systems found in the
literature that perform the task of automatic text segmentation in sentences for English …

[PDF][PDF] The Lácio-Web: Corpora and Tools to Advance Brazilian Portuguese Language Investigations and Computational Linguistic Tools.

SM Aluísio, GM Pinheiro, AMP Manfrin… - LREC, 2004 - lrec-conf.org
In this paper we discuss the five requirements for building large publicly available corpora
which geared the construction of the Lácio-Web corpora and their environments: 1) a …

Experiments on sentence boundary detection in user-generated web content

R López, TAS Pardo - … Linguistics and Intelligent Text Processing: 16th …, 2015 - Springer
Abstract Sentence Boundary Detection (SBD) is a very important prerequisite for proper
sentence analysis in different Natural Language Processing tasks. During the last years …

PoS-tagging the Web in Portuguese. National varieties, text typologies and spelling systems

M Garcia, P Gamallo, I Gayo… - … del Lenguaje Natural, 2014 - journal.sepln.org
The great amount of text produced every day in the Web turned it as one of the main sources
for obtaining linguistic corpora, that are further analyzed with Natural Language Processing …

E-TERMOS: Um ambiente colaborativo web de gestão terminológica

LHM Oliveira - 2009 - teses.usp.br
Em uma de suas definções, a Terminologia representa o conjunto de princípios e métodos
adotados no processo de gestão e criação de produtos terminológicos, tais como glossários …