C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content

G Heyman, I Vulić, MF Moens - Data Mining and Knowledge Discovery, 2016 - Springer
We study the problem of extracting cross-lingual topics from non-parallel multilingual text
datasets with partially overlapping thematic content (eg, aligned Wikipedia articles in two …

[PDF][PDF] A bridge over the language gap: Topic modelling for text analyses across languages for country comparative research

F Lind, JM Eberl, S Galyga… - University of Vienna …, 2019 - reminder-project.eu
A Bridge Over the Language Gap: Topic Modelling for Text Analyses Across Languages for
Country Comparative Research Page 1 Working PaPer A Bridge Over the Language Gap: Topic …

Efficient nearest-neighbor search in the probability simplex

K Krstovski, DA Smith, HM Wallach… - Proceedings of the 2013 …, 2013 - dl.acm.org
Document similarity tasks arise in many areas of information retrieval and natural language
processing. A fundamental question when comparing documents is which representation to …

[PDF][PDF] BigARTM: библиотека с открытым кодом для тематического моделирования больших текстовых коллекций

К Воронцов, А Фрей, П Ромов, АО Янина… - … данными в областях …, 2015 - recognition.su
Аннотация Тематическое моделирование—это одно из современных направлений
статистического анализа текстов, активно развивающееся последние 10–15 лет …

[PDF][PDF] Temporal and object relations in unsupervised plan and activity recognition

RG Freedman, HT Jung, S Zilberstein - 2015 AAAI Fall Symposium …, 2015 - cdn.aaai.org
We consider ways to improve the performance of unsupervised plan and activity recognition
techniques by considering temporal and object relations in addition to postural data …

[PDF][PDF] Bootstrapping translation detection and sentence extraction from comparable corpora

K Krstovski, DA Smith - Proceedings of the 2016 Conference of …, 2016 - aclanthology.org
Most work on extracting parallel text from comparable corpora depends on linguistic
resources such as seed parallel documents or translation dictionaries. This paper presents a …

[PDF][PDF] Online multilingual topic models with multi-level hyperpriors

K Krstovski, DA Smith, MJ Kurtz - … of the 2016 Conference of the …, 2016 - aclanthology.org
For topic models, such as LDA, that use a bag-of-words assumption, it becomes especially
important to break the corpus into appropriately-sized “documents”. Since the models are …

Bilingual Topic Models for Comparable Corpora

G Balikas, MR Amini, M Clausel - arXiv preprint arXiv:2111.15278, 2021 - arxiv.org
Probabilistic topic models like Latent Dirichlet Allocation (LDA) have been previously
extended to the bilingual setting. A fundamental modeling assumption in several of these …

Mining and learning from multilingual text collections using topic models and word embeddings

G Balikas - 2017 - hal.science
Text is one of the most pervasive and persistent sources of information. Content analysis of
text in its broad sense refers to methods for studying and retrieving information from …

Multilingual Topic Models

K Krstovski, MJ Kurtz, DA Smith… - arXiv preprint arXiv …, 2017 - arxiv.org
Scientific publications have evolved several features for mitigating vocabulary mismatch
when indexing, retrieving, and computing similarity between articles. These mitigation …