Ranknas: Efficient neural architecture search by pairwise ranking

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

被引用次数：88 相关文章所有 3 个版本

[PDF] ieee.org

Survey of different large language model architectures: Trends, benchmarks, and challenges

M Shao, A Basit, R Karri, M Shafique - IEEE Access, 2024 - ieeexplore.ieee.org

Large Language Models (LLMs) represent a class of deep learning models adept at
understanding natural language and generating coherent responses to various prompts or …

被引用次数：8 相关文章所有 6 个版本

[PDF] aaai.org

Esrl: Efficient sampling-based reinforcement learning for sequence generation

C Wang, H Zhou, Y Hu, Y Huo, B Li, T Liu… - Proceedings of the …, 2024 - ojs.aaai.org

Applying Reinforcement Learning (RL) to sequence generation models enables the direct
optimization of long-term rewards (\textit {eg,} BLEU and human feedback), but typically …

被引用次数：7 相关文章所有 6 个版本

[PDF] arxiv.org

RD-NAS: Enhancing one-shot supernet ranking ability via ranking distillation from zero-cost proxies

P Dong, X Niu, L Li, Z Tian, X Wang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Neural architecture search (NAS) has made tremendous progress in the automatic design of
effective neural network structures but suffers from a heavy computational burden. One-shot …

被引用次数：19 相关文章所有 4 个版本

[PDF] arxiv.org

Neural architecture search on efficient transformers and beyond

Z Liu, D Li, K Lu, Z Qin, W Sun, J Xu… - arXiv preprint arXiv …, 2022 - arxiv.org

Recently, numerous efficient Transformers have been proposed to reduce the quadratic
computational complexity of standard Transformers caused by the Softmax attention …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Introduction to Transformers: an NLP Perspective

T Xiao, J Zhu - arXiv preprint arXiv:2311.17633, 2023 - arxiv.org

Transformers have dominated empirical machine learning models of natural language
processing. In this paper, we introduce basic concepts of Transformers and present key …

被引用次数：20 相关文章所有 4 个版本

[PDF] arxiv.org

Wide attention is the way forward for transformers?

JR Brown, Y Zhao, I Shumailov, RD Mullins - arXiv preprint arXiv …, 2022 - arxiv.org

The Transformer is an extremely powerful and prominent deep learning architecture. In this
work, we challenge the commonly held belief in deep learning that going deeper is better …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

Learning Evaluation Models from Large Language Models for Sequence Generation

C Wang, H Zhou, K Chang, T Liu, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models achieve state-of-the-art performance on sequence generation
evaluation, but typically have a large number of parameters. This is a computational …

被引用次数：4 相关文章所有 2 个版本

Automatic Fuzzy Architecture Design for Defect Detection via Classifier-Assisted Multiobjective Optimization Approach

N Li, B Xue, L Ma, M Zhang - IEEE Transactions on …, 2025 - ieeexplore.ieee.org

Defect recognition is an essential aspect of intelligent manufacturing, but it is a challenging
task with noise and unpredictable uncertainties, where convolutional neural networks …

[PDF] arxiv.org

Dartformer: Finding the best type of attention

JR Brown, Y Zhao, I Shumailov, RD Mullins - arXiv preprint arXiv …, 2022 - arxiv.org

Given the wide and ever growing range of different efficient Transformer attention
mechanisms, it is important to identify which attention is most effective when given a task. In …

被引用次数：6 相关文章所有 3 个版本

高级搜索

QQ 群