Learning topic models: Identifiability and finite-sample analysis

Y Chen, S He, Y Yang, F Liang - Journal of the American Statistical …, 2023 - Taylor & Francis
Topic models provide a useful text-mining tool for learning, extracting, and discovering latent
structures in large text corpora. Although a plethora of methods have been proposed for …

DOLDA: a regularized supervised topic model for high-dimensional multi-class regression

M Magnusson, L Jonsson, M Villani - Computational Statistics, 2020 - Springer
Generating user interpretable multi-class predictions in data-rich environments with many
classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent …

Polya urn latent Dirichlet allocation: a doubly sparse massively parallel sampler

A Terenin, M Magnusson, L Jonsson… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language
processing and machine learning. Most approaches to training the model rely on iterative …

The cambridge law corpus: a dataset for legal AI research

A Östling, H Sargeant, H Xie, L Bull… - … of Cambridge Faculty …, 2024 - papers.ssrn.com
Abstract We introduce the Cambridge Law Corpus (CLC), a corpus for legal AI research. It
consists of over 250 000 court cases from the UK. Most cases are from the 21st century, but …

Automatic localization of bugs to faulty components in large scale software systems using bayesian classification

L Jonsson, D Broman, M Magnusson… - … on Software Quality …, 2016 - ieeexplore.ieee.org
We suggest a Bayesian approach to the problem of reducing bug turn-around time in large
software development organizations. Our approach is to use classification to predict where …

[图书][B] Machine learning-based bug handling in large-scale software development

L Jonsson - 2018 - books.google.com
This thesis investigates the possibilities of automating parts of the bug handling process in
large-scale software development organizations. The bug handling process is a large part of …

Exploit latent Dirichlet allocation for collaborative filtering

Z Li, H Zhang, S Wang, F Huang, Z Li… - Frontiers of Computer …, 2018 - Springer
Previous work on the one-class collaborative filtering (OCCF) problem can be roughly
categorized into pointwise methods, pairwise methods, and content-based methods. A …

Discovering medication patterns for high-complexity drug-using diseases through electronic medical records

H Huang, X Shang, H Zhao, N Wu, W Li, Y Xu… - IEEE …, 2019 - ieeexplore.ieee.org
An Electronic Medical Record (EMR) is a professional document that contains all data
generated during the treatment process. The EMR can utilize various data formats, such as …

Bayesian allocation model: Inference by sequential monte carlo for nonnegative tensor factorizations and topic models using polya urns

AT Cemgil, MB Kurutmaz, S Yildirim, M Barsbey… - arXiv preprint arXiv …, 2019 - arxiv.org
We introduce a dynamic generative model, Bayesian allocation model (BAM), which
establishes explicit connections between nonnegative tensor factorization (NTF), graphical …

DOLDA-a regularized supervised topic model for high-dimensional multi-class regression

M Magnusson, L Jonsson, M Villani - arXiv preprint arXiv:1602.00260, 2016 - arxiv.org
Generating user interpretable multi-class predictions in data rich environments with many
classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent …