Multilingual document clustering: an heuristic approach based on cognate named entities

Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications

I Vulić, W De Smet, J Tang, MF Moens - Information Processing & …, 2015 - Elsevier

Probabilistic topic models are unsupervised generative models which model document
content as a two-step generation process, that is, documents are observed as mixtures of …

被引用次数：142 相关文章所有 14 个版本

Multilingual aspect clustering for sentiment analysis

LRC Pessutto, DS Vargas, VP Moreira - Knowledge-Based Systems, 2020 - Elsevier

In the last few years, there has been growing interest in aspect-based sentiment analysis,
which deals with extracting, clustering, and rating the overall opinion about the features of …

被引用次数：27 相关文章

Cross-lingual document representation and semantic similarity measure: A fuzzy set and rough set based approach

HH Huang, YH Kuo - IEEE Transactions on Fuzzy Systems, 2010 - ieeexplore.ieee.org

As cross-lingual information retrieval is attracting increasing attention, tools that measure
cross-lingual semantic similarity between documents are becoming desirable. In this paper …

被引用次数：58 相关文章所有 5 个版本

Techniques for named entity recognition: a survey

GK Palshikar - Collaboration and the Semantic Web: Social Networks …, 2012 - igi-global.com

While building and using a fully semantic understanding of Web contents is a distant goal,
named entities (NEs) provide a small, tractable set of elements carrying a well-defined …

被引用次数：38 相关文章所有 2 个版本

[PDF] aclanthology.org

[PDF][PDF] Feature-based method for document alignment in comparable news corpora

T Vu, AT Aw, M Zhang - Proceedings of the 12th Conference of the …, 2009 - aclanthology.org

In this paper, we present a feature-based method to align documents with similar content
across two sets of bilingual comparable corpora from daily news texts. We evaluate the …

被引用次数：46 相关文章所有 9 个版本

[PDF] mt-archive.net

[PDF][PDF] A light way to collect comparable corpora from the Web.

A Aker, E Kanoulas, RJ Gaizauskas - LREC, 2012 - mt-archive.net

Abstract Statistical Machine Translation (SMT) relies on the availability of rich parallel
corpora. However, in the case of under-resourced languages, parallel corpora are not …

被引用次数：38 相关文章所有 9 个版本

[PDF] academia.edu

Bilingual news clustering using named entities and fuzzy similarity

S Montalvo, R Martínez, A Casillas, V Fresno - Text, Speech and Dialogue …, 2007 - Springer

This paper is focused on discovering bilingual news clusters in a comparable corpus.
Particularly, we deal with the news representation and with the calculation of the similarity …

被引用次数：39 相关文章所有 11 个版本

Hadoop and natural language processing based analysis on kisan call center (kcc) data

VK Viswanath, CGV Madhuri, C Raviteja… - … on Advances in …, 2018 - ieeexplore.ieee.org

Call Centers have always played a highly significant role in the service industry, from retail
to technical support. Government of India (GOI) has launched Kisan Call Centers (KCC) …

被引用次数：12 相关文章

[PDF] core.ac.uk

[PDF][PDF] Named entity recognition

IM Konkol - University of West Bohemia, 2015 - core.ac.uk

The idea of automatic extraction of important information from text documents comes from
the time of first steps in the natural language processing. Its importance rapidly grows with …

被引用次数：14 相关文章

[PDF] iiit.ac.in

A language-independent approach to identify the named entities in under-resourced languages and clustering multilingual documents

NK Kumar, GSK Santosh, V Varma - Multilingual and Multimodal …, 2011 - Springer

This paper presents a language-independent Multilingual Document Clustering (MDC)
approach on comparable corpora. Named entites (NEs) such as persons, locations …

被引用次数：19 相关文章所有 14 个版本

高级搜索

QQ 群