Y Jiang, Q He, X Zhuang,
Z Wu, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Existing large language models have to run K times to generate a sequence of K tokens. In
this paper, we present RecycleGPT, a generative language model with fast decoding speed …