Fast structured decoding for sequence models

S Kim, K Mangalam, S Moon, J Malik… - Advances in …, 2024 - proceedings.neurips.cc

The recent emergence of Large Language Models based on the Transformer architecture
has enabled dramatic advancements in the field of Natural Language Processing. However …

被引用次数：61 相关文章所有 5 个版本

[PDF] arxiv.org

Llm inference unveiled: Survey and roofline model insights

Z Yuan, Y Shang, Y Zhou, Z Dong, Z Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a
unique blend of opportunities and challenges. Although the field has expanded and is …

被引用次数：47 相关文章所有 2 个版本

[PDF] arxiv.org

Contextualized perturbation for textual adversarial attack

D Li, Y Zhang, H Peng, L Chen, C Brockett… - arXiv preprint arXiv …, 2020 - arxiv.org

Adversarial examples expose the vulnerabilities of natural language processing (NLP)
models, and can be used to evaluate and improve their robustness. Existing techniques of …

被引用次数：247 相关文章所有 5 个版本

[PDF] arxiv.org

A survey on non-autoregressive generation for neural machine translation and beyond

Y Xiao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

被引用次数：89 相关文章所有 8 个版本

[PDF] thecvf.com

Masked image modeling with local multi-scale reconstruction

H Wang, Y Tang, Y Wang, J Guo… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Masked Image Modeling (MIM) achieves outstanding success in self-supervised
representation learning. Unfortunately, MIM models typically have huge computational …

被引用次数：48 相关文章所有 7 个版本

[PDF] arxiv.org

Glancing transformer for non-autoregressive neural machine translation

L Qian, H Zhou, Y Bao, M Wang, L Qiu… - arXiv preprint arXiv …, 2020 - arxiv.org

Recent work on non-autoregressive neural machine translation (NAT) aims at improving the
efficiency by parallel decoding without sacrificing the quality. However, existing NAT …

被引用次数：156 相关文章所有 10 个版本

[PDF] arxiv.org

Step-unrolled denoising autoencoders for text generation

N Savinov, J Chung, M Binkowski, E Elsen… - arXiv preprint arXiv …, 2021 - arxiv.org

In this paper we propose a new generative model of text, Step-unrolled Denoising
Autoencoder (SUNDAE), that does not rely on autoregressive models. Similarly to denoising …

被引用次数：92 相关文章所有 3 个版本

[PDF] arxiv.org

Deep encoder, shallow decoder: Reevaluating non-autoregressive machine translation

J Kasai, N Pappas, H Peng, J Cross… - arXiv preprint arXiv …, 2020 - arxiv.org

Much recent effort has been invested in non-autoregressive neural machine translation,
which appears to be an efficient alternative to state-of-the-art autoregressive machine …

被引用次数：177 相关文章所有 5 个版本

[PDF] ecva.net

Hand-transformer: Non-autoregressive structured modeling for 3d hand pose estimation

L Huang, J Tan, J Liu, J Yuan - … Conference, Glasgow, UK, August 23–28 …, 2020 - Springer

Abstract 3D hand pose estimation is still far from a well-solved problem mainly due to the
highly nonlinear dynamics of hand pose and the difficulties of modeling its inherent …

被引用次数：128 相关文章所有 4 个版本

[PDF] arxiv.org

Understanding and improving lexical choice in non-autoregressive translation

L Ding, L Wang, X Liu, DF Wong, D Tao… - arXiv preprint arXiv …, 2020 - arxiv.org

Knowledge distillation (KD) is essential for training non-autoregressive translation (NAT)
models by reducing the complexity of the raw data with an autoregressive teacher model. In …

被引用次数：117 相关文章所有 4 个版本

高级搜索

QQ 群