models for downstream tasks. We introduce UDALM, a fine-tuning procedure, using a mixed
classification and Masked Language Model loss, that can adapt to the target domain
distribution in a robust and sample efficient manner. Our experiments show that performance
of models trained with the mixed loss scales with the amount of available target data and the
mixed loss can be effectively used as a stopping criterion during UDA training. Furthermore …