Masked particle modeling on sets: Towards self-supervised high energy physics foundation models

L Heinrich, T Golling, M Kagan, S Klein… - Machine Learning …, 2024 - iopscience.iop.org
We propose masked particle modeling (MPM) as a self-supervised method for learning
generic, transferable, and reusable representations on unordered sets of inputs for use in …

Givt: Generative infinite-vocabulary transformers

M Tschannen, C Eastwood, F Mentzer - arXiv preprint arXiv:2312.02116, 2023 - arxiv.org
We introduce generative infinite-vocabulary transformers (GIVT) which generate vector
sequences with real-valued entries, instead of discrete tokens from a finite vocabulary. To …

Vector Quantization for Recommender Systems: A Review and Outlook

Q Liu, X Dong, J Xiao, N Chen, H Hu, J Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Vector quantization, renowned for its unparalleled feature compression capabilities, has
been a prominent topic in signal processing and machine learning research for several …

EdVAE: Mitigating codebook collapse with evidential discrete variational autoencoders

G Baykal, M Kandemir, G Unal - Pattern Recognition, 2024 - Elsevier
Codebook collapse is a common problem in training deep generative models with discrete
representation spaces like Vector Quantized Variational Autoencoders (VQ-VAEs). We …

Learning to act without actions

D Schmidt, M Jiang - arXiv preprint arXiv:2312.10812, 2023 - arxiv.org
Pre-training large models on vast amounts of web data has proven to be an effective
approach for obtaining powerful, general models in several domains, including language …

VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling

S Li, Z Wang, Z Liu, D Wu, C Tan, J Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org
Similar to natural language models, pre-trained genome language models are proposed to
capture the underlying intricacies within genomes with unsupervised sequence modeling …

VQ-NeRV: A Vector Quantized Neural Representation for Videos

Y Xu, X Feng, F Qin, R Ge, Y Peng, C Wang - arXiv preprint arXiv …, 2024 - arxiv.org
Implicit neural representations (INR) excel in encoding videos within neural networks,
showcasing promise in computer vision tasks like video compression and denoising. INR …

OmniJet-: The first cross-task foundation model for particle physics

J Birk, A Hallin, G Kasieczka - arXiv preprint arXiv:2403.05618, 2024 - arxiv.org
Foundation models are multi-dataset and multi-task machine learning methods that once pre-
trained can be fine-tuned for a large variety of downstream applications. The successful …

Learning the Language of Protein Structure

B Gaujac, J Donà, L Copoiu, T Atkinson… - arXiv preprint arXiv …, 2024 - arxiv.org
Representation learning and\emph {de novo} generation of proteins are pivotal
computational biology tasks. Whilst natural language processing (NLP) techniques have …

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

D Yang, H Guo, Y Wang, R Huang, X Li, X Tan… - arXiv preprint arXiv …, 2024 - arxiv.org
The Large Language models (LLMs) have demonstrated supreme capabilities in text
understanding and generation, but cannot be directly applied to cross-modal tasks without …