[PDF][PDF] Collecting and using comparable corpora for statistical machine translation

I Skadiņa, A Aker, N Mastropavlos, F Su… - Proceedings of the 8th …, 2012 - academia.edu
Lack of sufficient parallel data for many languages and domains is currently one of the major
obstacles to further advancement of automated translation. The ACCURAT project is …

[PDF][PDF] Slavic information extraction and partial parsing

A Przepiórkowski - Proceedings of the Workshop on Balto …, 2007 - aclanthology.org
Abstract Information Extraction (IE) often involves some amount of partial syntactic
processing. This is clear in cases of interesting highlevel IE tasks, such as finding …

Hrvatski jezik u digitalnom dobu

M Tadić, D Brozović-Rončević, A Kapetanović - 2012 - darhiv.ffzg.unizg.hr
Information technology changes our everyday lives. We typically use computers for writing,
editing, calculating, and information searching, and increasingly for reading, listening to …

[PDF][PDF] Towards sentiment analysis of financial texts in croatian

Ž Agić, N Ljubešić, M Tadić - Bull market, 2010 - lrec-conf.org
The paper presents results of an experiment dealing with sentiment analysis of Croatian text
from the domain of finance. The goal of the experiment was to design a system model for …

Tagging named entities in Croatian tweets

K Baksa, D Golović, G Glavaš… - Slovenščina …, 2016 - madoc.bib.uni-mannheim.de
Named entity extraction tools designed for recognizing named entities in texts written in
standard language (eg, news stories or legal texts) have been shown to be inadequate for …

Evaluating language tools for fifteen EU-official under-resourced languages

D Alves, G Thakkar, M Tadić - arXiv preprint arXiv:2010.12428, 2020 - arxiv.org
This article presents the results of the evaluation campaign of language tools available for
fifteen EU-official under-resourced languages. The evaluation was conducted within the …

Language processing infrastructure in the xlike project

L Padró, Z Agic, X Carreras, B Fortuna… - LREC 2014: Ninth …, 2014 - upcommons.upc.edu
This paper presents the linguistic analysis tools and its infrastructure developed within the
XLike project. The main goal of the implemented tools is to provide a set of functionalities for …

[PDF][PDF] Robust keyphrase extraction for a large-scale Croatian news production system

J Mijic, BD Bašic, J Šnajder - FASSBL7, 2010 - dcl.bas.bg
Summarizing an article with just a few keyphrases can be a difficult task, even for trained
experts. Large-scale keyphrase extraction requires a method that is fast and reliable, and yet …

[PDF][PDF] Simple Ways to Improve NER in Every Language using Markup.

LA Cabrera-Diego, JG Moreno, A Doucet - CLEOPATRA@ WWW, 2021 - ceur-ws.org
We explore three different methods for improving Named Entity Recognition (NER) systems
based on BERT, each responding to one of three potential issues: the processing of …

[PDF][PDF] CroNER: Recognizing named entities in Croatian using conditional random fields

M Karan, G Glavaš, F Šarić, J Šnajder, J Mijić, A Šilić… - Informatica, 2013 - informatica.si
CroNER: Recognizing Named Entities in Croatian Using Conditional Random Fields 1
Introduction 2 Related work Page 1 Informatica 37 (2013) 165–172 165 CroNER: Recognizing …