[HTML][HTML] Large-scale multi-modal pre-trained models: A comprehensive survey

X Wang, G Chen, G Qian, P Gao, XY Wei… - Machine Intelligence …, 2023 - Springer
With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …

Federated vehicular transformers and their federations: Privacy-preserving computing and cooperation for autonomous driving

Y Tian, J Wang, Y Wang, C Zhao… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Cooperative computing is promising to enhance the performance and safety of autonomous
vehicles benefiting from the increase in the amount, diversity as well as scope of data …

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

CSwin-PNet: A CNN-Swin Transformer combined pyramid network for breast lesion segmentation in ultrasound images

H Yang, D Yang - Expert Systems with Applications, 2023 - Elsevier
Currently, the automatic segmentation of breast tumors based on breast ultrasound (BUS)
images is still a challenging task. Most lesion segmentation methods are implemented …

Video transformers: A survey

J Selva, AS Johansen, S Escalera… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …

视觉Transformer 研究的关键问题: 现状及展望

田永林, 王雨桐, 王建功, 王晓, 王飞跃 - 自动化学报, 2022 - aas.net.cn
Transformer 所具备的长距离建模能力和并行计算能力使其在自然语言处理领域取得了巨大
成功并逐步拓展至计算机视觉等领域. 本文以分类任务为切入, 介绍了典型视觉Transformer …

Pyramid self-attention polymerization learning for semi-supervised skeleton-based action recognition

B Xu, X Shu - arXiv preprint arXiv:2302.02327, 2023 - arxiv.org
Most semi-supervised skeleton-based action recognition approaches aim to learn the
skeleton action representations only at the joint level, but neglect the crucial motion …

Semantic segmentation using Vision Transformers: A survey

H Thisanke, C Deshan, K Chamith… - … Applications of Artificial …, 2023 - Elsevier
Semantic segmentation has a broad range of applications in a variety of domains including
land coverage analysis, autonomous driving, and medical image analysis. Convolutional …

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arXiv preprint arXiv …, 2023 - arxiv.org
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

Vision transformers for dense prediction: A survey

S Zuo, Y Xiao, X Chang, X Wang - Knowledge-Based Systems, 2022 - Elsevier
Transformers have demonstrated impressive expressiveness and transfer capability in
computer vision fields. Dense prediction is a fundamental problem in computer vision that is …