PH Chi,
PH Chung, TH Wu, CC Hsieh… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
… At the pre-training stage, we train our models with learning rate 5e-5, batch size 50, and
AdamW optimizer [26] for 500k steps. The models are pre-trained on a single NVIDIA Tesla …