B Phan, M Havasi, M Muckley, K Ullrich - arXiv e-prints, 2024 - ui.adsabs.harvard.edu
State-of-the-art language models are autoregressive and operate on subword units known
as tokens. Specifically, one must encode the conditioning string into a list of tokens before …