查看文章

arxiv.org 中的 [PDF]

Towards Multi-modal Transformers in Federated Learning

作者

Guangyu Sun, Matias Mendieta, Aritra Dutta, Xin Li, Chen Chen

发表日期

2024/4/18

期刊

arXiv preprint arXiv:2404.12467

简介

Multi-modal transformers mark significant progress in different domains, but siloed high-quality data hinders their further improvement. To remedy this, federated learning (FL) has emerged as a promising privacy-preserving paradigm for training models without direct access to the raw data held by different clients. Despite its potential, a considerable research direction regarding the unpaired uni-modal clients and the transformer architecture in FL remains unexplored. To fill this gap, this paper explores a transfer multi-modal federated learning (MFL) scenario within the vision-language domain, where clients possess data of various modalities distributed across different datasets. We systematically evaluate the performance of existing methods when a transformer architecture is utilized and introduce a novel framework called Federated modality complementary and collaboration (FedCola) by addressing the in-modality and cross-modality gaps among clients. Through extensive experiments across various FL settings, FedCola demonstrates superior performance over previous approaches, offering new perspectives on future federated training of multi-modal transformers.

学术搜索中的文章

Towards Multi-modal Transformers in Federated Learning

G Sun, M Mendieta, A Dutta, X Li, C Chen - arXiv preprint arXiv:2404.12467, 2024