A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

Survey of different large language model architectures: Trends, benchmarks, and challenges

M Shao, A Basit, R Karri, M Shafique - IEEE Access, 2024 - ieeexplore.ieee.org
Large Language Models (LLMs) represent a class of deep learning models adept at
understanding natural language and generating coherent responses to various prompts or …

Esrl: Efficient sampling-based reinforcement learning for sequence generation

C Wang, H Zhou, Y Hu, Y Huo, B Li, T Liu… - Proceedings of the …, 2024 - ojs.aaai.org
Applying Reinforcement Learning (RL) to sequence generation models enables the direct
optimization of long-term rewards (\textit {eg,} BLEU and human feedback), but typically …

RD-NAS: Enhancing one-shot supernet ranking ability via ranking distillation from zero-cost proxies

P Dong, X Niu, L Li, Z Tian, X Wang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Neural architecture search (NAS) has made tremendous progress in the automatic design of
effective neural network structures but suffers from a heavy computational burden. One-shot …

Neural architecture search on efficient transformers and beyond

Z Liu, D Li, K Lu, Z Qin, W Sun, J Xu… - arXiv preprint arXiv …, 2022 - arxiv.org
Recently, numerous efficient Transformers have been proposed to reduce the quadratic
computational complexity of standard Transformers caused by the Softmax attention …

Introduction to Transformers: an NLP Perspective

T Xiao, J Zhu - arXiv preprint arXiv:2311.17633, 2023 - arxiv.org
Transformers have dominated empirical machine learning models of natural language
processing. In this paper, we introduce basic concepts of Transformers and present key …

Wide attention is the way forward for transformers?

JR Brown, Y Zhao, I Shumailov, RD Mullins - arXiv preprint arXiv …, 2022 - arxiv.org
The Transformer is an extremely powerful and prominent deep learning architecture. In this
work, we challenge the commonly held belief in deep learning that going deeper is better …

Learning Evaluation Models from Large Language Models for Sequence Generation

C Wang, H Zhou, K Chang, T Liu, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models achieve state-of-the-art performance on sequence generation
evaluation, but typically have a large number of parameters. This is a computational …

Automatic Fuzzy Architecture Design for Defect Detection via Classifier-Assisted Multiobjective Optimization Approach

N Li, B Xue, L Ma, M Zhang - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
Defect recognition is an essential aspect of intelligent manufacturing, but it is a challenging
task with noise and unpredictable uncertainties, where convolutional neural networks …

Dartformer: Finding the best type of attention

JR Brown, Y Zhao, I Shumailov, RD Mullins - arXiv preprint arXiv …, 2022 - arxiv.org
Given the wide and ever growing range of different efficient Transformer attention
mechanisms, it is important to identify which attention is most effective when given a task. In …