Fedzero: Leveraging renewable excess energy in federated learning

P Wiesner, R Khalili, D Grinwald, P Agrawal… - Proceedings of the 15th …, 2024 - dl.acm.org
Proceedings of the 15th ACM International Conference on Future and …, 2024dl.acm.org
Federated Learning (FL) is an emerging machine learning technique that enables
distributed model training across data silos or edge devices without data sharing. Yet, FL
inevitably introduces inefficiencies compared to centralized model training, which will further
increase the already high energy usage and associated carbon emissions of machine
learning in the future. One idea to reduce FL's carbon footprint is to schedule training jobs
based on the availability of renewable excess energy that can occur at certain times and …
Federated Learning (FL) is an emerging machine learning technique that enables distributed model training across data silos or edge devices without data sharing. Yet, FL inevitably introduces inefficiencies compared to centralized model training, which will further increase the already high energy usage and associated carbon emissions of machine learning in the future. One idea to reduce FL’s carbon footprint is to schedule training jobs based on the availability of renewable excess energy that can occur at certain times and places in the grid. However, in the presence of such volatile and unreliable resources, existing FL schedulers cannot always ensure fast, efficient, and fair training.
We propose FedZero, an FL system that operates exclusively on renewable excess energy and spare capacity of compute infrastructure to effectively reduce a training’s operational carbon emissions to zero. Using energy and load forecasts, FedZero leverages the spatio-temporal availability of excess resources by selecting clients for fast convergence and fair participation. Our evaluation, based on real solar and load traces, shows that FedZero converges significantly faster than existing approaches under the mentioned constraints while consuming less energy. Furthermore, it is robust to forecasting errors and scalable to tens of thousands of clients.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果