X Zhang, C Du, T Pang, Q Liu, W Gao, M Lin - arXiv preprint arXiv …, 2024 - arxiv.org
The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving …
In recent years, text-to-speech (TTS) technology has witnessed impressive advancements, particularly with large-scale training datasets, showcasing human-level speech quality and …
J Lu, J Li, S An, M Zhao, Y He, D Yin, X Sun - arXiv preprint arXiv …, 2024 - arxiv.org
Direct Preference Optimization (DPO) has emerged as a prominent algorithm for the direct and robust alignment of Large Language Models (LLMs) with human preferences, offering a …