Comparison of vision transformers and convolutional neural networks in medical image analysis: a systematic review

S Takahashi, Y Sakaguchi, N Kouno… - Journal of Medical …, 2024 - Springer
In the rapidly evolving field of medical image analysis utilizing artificial intelligence (AI), the
selection of appropriate computational models is critical for accurate diagnosis and patient …

Efficientformer: Vision transformers at mobilenet speed

Y Li, G Yuan, Y Wen, J Hu… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Vision Transformers (ViT) have shown rapid progress in computer vision tasks,
achieving promising results on various benchmarks. However, due to the massive number of …

A comprehensive review and a taxonomy of edge machine learning: Requirements, paradigms, and techniques

W Li, H Hacid, E Almazrouei, M Debbah - AI, 2023 - mdpi.com
The union of Edge Computing (EC) and Artificial Intelligence (AI) has brought forward the
Edge AI concept to provide intelligent solutions close to the end-user environment, for …

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

Elasticvit: Conflict-aware supernet training for deploying fast vision transformer on diverse mobile devices

C Tang, LL Zhang, H Jiang, J Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Neural Architecture Search (NAS) has shown promising performance in the
automatic design of vision transformers (ViT) exceeding 1G FLOPs. However, designing …

Advancements in accelerating deep neural network inference on aiot devices: A survey

L Cheng, Y Gu, Q Liu, L Yang, C Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The amalgamation of artificial intelligence with Internet of Things (AIoT) devices have seen a
rapid surge in growth, largely due to the effective implementation of deep neural network …

Mobilediffusion: Subsecond text-to-image generation on mobile devices

Y Zhao, Y Xu, Z Xiao, T Hou - arXiv preprint arXiv:2311.16567, 2023 - arxiv.org
The deployment of large-scale text-to-image diffusion models on mobile devices is impeded
by their substantial model size and slow inference speed. In this paper, we propose\textbf …

RGB no more: Minimally-decoded JPEG Vision Transformers

J Park, J Johnson - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Most neural networks for computer vision are designed to infer using RGB images. However,
these RGB images are commonly encoded in JPEG before saving to disk; decoding them …

[PDF][PDF] Resource-efficient algorithms and systems of foundation models: A survey

M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Comput. Surv., 2024 - xumengwei.github.io
In the rapidly evolving field of artificial intelligence (AI), a paradigm shift is underway. We are
witnessing the transition from specialized, fragmented deep learning models to versatile …

Learned thresholds token merging and pruning for vision transformers

M Bonnaerens, J Dambre - arXiv preprint arXiv:2307.10780, 2023 - arxiv.org
Vision transformers have demonstrated remarkable success in a wide range of computer
vision tasks over the last years. However, their high computational costs remain a significant …