H Yang,
J Zhou, Y Fu, X Wang, R Roane… - arXiv preprint arXiv …, 2024 - arxiv.org
It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem,
existing work exploits the combination of CPU and GPU for the training process, such as …