The demand for machine learning (ML) has increased significantly in recent decades, enabling several applications, such as speech recognition, computer vision, and …
Distributed training of foundation models, especially large language models (LLMs), is communication-intensive and so has heavily relied on centralized data centers with fast …
Training machine learning models in parallel is an increasingly important workload. We accelerate distributed parallel training by designing a communication primitive that uses a …
N Bouacida, P Mohapatra - IEEE Access, 2021 - ieeexplore.ieee.org
With more regulations tackling the protection of users' privacy-sensitive data in recent years, access to such data has become increasingly restricted. A new decentralized training …
Error feedback (EF), also known as error compensation, is an immensely popular convergence stabilization mechanism in the context of distributed training of supervised …
Although deep learning has made great progress in recent years, the exploding economic and environmental costs of training neural networks are becoming unsustainable. To …
G Wu, S Gong - Proceedings of the IEEE/CVF International …, 2021 - openaccess.thecvf.com
Contemporary domain generalization (DG) and multi-source unsupervised domain adaptation (UDA) methods mostly collect data from multiple domains together for joint …
Powerful computer clusters are used nowadays to train complex deep neural networks (DNN) on large datasets. Distributed training increasingly becomes communication bound …
In the traditional distributed machine learning scenario, the user's private data is transmitted between clients and a central server, which results in significant potential privacy risks. In …