The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures. In this study, we aim to leverage the long …
As one of the most representative DL techniques, Transformer architecture has empowered numerous advanced models, especially the large language models (LLMs) that comprise …
Mamba is emerging as a novel approach to overcome the challenges faced by Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in computer vision …
Z Zhang, H Gao, A Liu, Q Chen, F Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Human motion generation is a cut-edge area of research in generative computer vision, with promising applications in video creation, game development, and robotic manipulation. The …
M Li, J Yuan, S Chen, L Zhang, A Zhu, X Chen… - The Thirty-eighth Annual … - openreview.net
Transformer-based architectures have been proven successful in detecting 3D objects from point clouds. However, the quadratic complexity of the attention mechanism struggles to …