Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications

I Vulić, W De Smet, J Tang, MF Moens - Information Processing & …, 2015 - Elsevier
Probabilistic topic models are unsupervised generative models which model document
content as a two-step generation process, that is, documents are observed as mixtures of …

Multilingual aspect clustering for sentiment analysis

LRC Pessutto, DS Vargas, VP Moreira - Knowledge-Based Systems, 2020 - Elsevier
In the last few years, there has been growing interest in aspect-based sentiment analysis,
which deals with extracting, clustering, and rating the overall opinion about the features of …

Cross-lingual document representation and semantic similarity measure: A fuzzy set and rough set based approach

HH Huang, YH Kuo - IEEE Transactions on Fuzzy Systems, 2010 - ieeexplore.ieee.org
As cross-lingual information retrieval is attracting increasing attention, tools that measure
cross-lingual semantic similarity between documents are becoming desirable. In this paper …

Techniques for named entity recognition: a survey

GK Palshikar - Collaboration and the Semantic Web: Social Networks …, 2012 - igi-global.com
While building and using a fully semantic understanding of Web contents is a distant goal,
named entities (NEs) provide a small, tractable set of elements carrying a well-defined …

[PDF][PDF] Feature-based method for document alignment in comparable news corpora

T Vu, AT Aw, M Zhang - Proceedings of the 12th Conference of the …, 2009 - aclanthology.org
In this paper, we present a feature-based method to align documents with similar content
across two sets of bilingual comparable corpora from daily news texts. We evaluate the …

[PDF][PDF] A light way to collect comparable corpora from the Web.

A Aker, E Kanoulas, RJ Gaizauskas - LREC, 2012 - mt-archive.net
Abstract Statistical Machine Translation (SMT) relies on the availability of rich parallel
corpora. However, in the case of under-resourced languages, parallel corpora are not …

Bilingual news clustering using named entities and fuzzy similarity

S Montalvo, R Martínez, A Casillas, V Fresno - Text, Speech and Dialogue …, 2007 - Springer
This paper is focused on discovering bilingual news clusters in a comparable corpus.
Particularly, we deal with the news representation and with the calculation of the similarity …

Hadoop and natural language processing based analysis on kisan call center (kcc) data

VK Viswanath, CGV Madhuri, C Raviteja… - … on Advances in …, 2018 - ieeexplore.ieee.org
Call Centers have always played a highly significant role in the service industry, from retail
to technical support. Government of India (GOI) has launched Kisan Call Centers (KCC) …

[PDF][PDF] Named entity recognition

IM Konkol - University of West Bohemia, 2015 - core.ac.uk
The idea of automatic extraction of important information from text documents comes from
the time of first steps in the natural language processing. Its importance rapidly grows with …

A language-independent approach to identify the named entities in under-resourced languages and clustering multilingual documents

NK Kumar, GSK Santosh, V Varma - Multilingual and Multimodal …, 2011 - Springer
This paper presents a language-independent Multilingual Document Clustering (MDC)
approach on comparable corpora. Named entites (NEs) such as persons, locations …