J Alman,
Z Song - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In modern machine learning, inner product attention computation is a fundamental task for
training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and …