Combining discrete and continuous data is an important capability for generative models. We present Discrete Flow Models (DFMs), a new flow-based model of discrete data that …
As a newly emerging advance in deep generative models, diffusion models have achieved state-of-the-art results in many fields, including computer vision, natural language …
The study of time series data is crucial for understanding trends and anomalies over time, enabling predictive insights across various sectors. Spatio-temporal data, on the other hand …
Z Fei, M Fan, C Yu, D Li, J Huang - arXiv preprint arXiv:2404.04478, 2024 - arxiv.org
Transformers have catalyzed advancements in computer vision and natural language processing (NLP) fields. However, substantial computational complexity poses limitations for …
X Chu, J Su, B Zhang, C Shen - arXiv preprint arXiv:2403.00522, 2024 - arxiv.org
Large language models are built on top of a transformer-based architecture to process textual inputs. For example, the LLaMA stands out among many open-source …
Z Yuan, R Chen, Z Li, H Jia, L He, C Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Sora is the first large-scale generalist video generation model that garnered significant attention across society. Since its launch by OpenAI in February 2024, no other video …
JYC Hu, W Wu, Z Li, Z Song, H Liu - arXiv preprint arXiv:2407.01079, 2024 - arxiv.org
We investigate the statistical and computational limits of latent\textbf {Di} ffusion\textbf {T} ransformers (\textbf {DiT} s) under the low-dimensional linear latent space assumption …
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics. At test time, given a single image and a sparse set of motion …
D Yang, R Huang, Y Wang, H Guo, D Chong… - arXiv preprint arXiv …, 2024 - arxiv.org
Scaling Text-to-speech (TTS) to large-scale datasets has been demonstrated as an effective method for improving the diversity and naturalness of synthesized speech. At the high level …