Speculative decoding with big little decoder

S Kim, K Mangalam, S Moon, J Malik… - Advances in …, 2024 - proceedings.neurips.cc
The recent emergence of Large Language Models based on the Transformer architecture
has enabled dramatic advancements in the field of Natural Language Processing. However …

Llm inference unveiled: Survey and roofline model insights

Z Yuan, Y Shang, Y Zhou, Z Dong, Z Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a
unique blend of opportunities and challenges. Although the field has expanded and is …

Contextualized perturbation for textual adversarial attack

D Li, Y Zhang, H Peng, L Chen, C Brockett… - arXiv preprint arXiv …, 2020 - arxiv.org
Adversarial examples expose the vulnerabilities of natural language processing (NLP)
models, and can be used to evaluate and improve their robustness. Existing techniques of …

A survey on non-autoregressive generation for neural machine translation and beyond

Y Xiao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

Masked image modeling with local multi-scale reconstruction

H Wang, Y Tang, Y Wang, J Guo… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Masked Image Modeling (MIM) achieves outstanding success in self-supervised
representation learning. Unfortunately, MIM models typically have huge computational …

Glancing transformer for non-autoregressive neural machine translation

L Qian, H Zhou, Y Bao, M Wang, L Qiu… - arXiv preprint arXiv …, 2020 - arxiv.org
Recent work on non-autoregressive neural machine translation (NAT) aims at improving the
efficiency by parallel decoding without sacrificing the quality. However, existing NAT …

Step-unrolled denoising autoencoders for text generation

N Savinov, J Chung, M Binkowski, E Elsen… - arXiv preprint arXiv …, 2021 - arxiv.org
In this paper we propose a new generative model of text, Step-unrolled Denoising
Autoencoder (SUNDAE), that does not rely on autoregressive models. Similarly to denoising …

Deep encoder, shallow decoder: Reevaluating non-autoregressive machine translation

J Kasai, N Pappas, H Peng, J Cross… - arXiv preprint arXiv …, 2020 - arxiv.org
Much recent effort has been invested in non-autoregressive neural machine translation,
which appears to be an efficient alternative to state-of-the-art autoregressive machine …

Hand-transformer: Non-autoregressive structured modeling for 3d hand pose estimation

L Huang, J Tan, J Liu, J Yuan - … Conference, Glasgow, UK, August 23–28 …, 2020 - Springer
Abstract 3D hand pose estimation is still far from a well-solved problem mainly due to the
highly nonlinear dynamics of hand pose and the difficulties of modeling its inherent …

Understanding and improving lexical choice in non-autoregressive translation

L Ding, L Wang, X Liu, DF Wong, D Tao… - arXiv preprint arXiv …, 2020 - arxiv.org
Knowledge distillation (KD) is essential for training non-autoregressive translation (NAT)
models by reducing the complexity of the raw data with an autoregressive teacher model. In …