Spyker: Asynchronous Multi-Server Federated Learning for Geo-Distributed Clients

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

8 Downloads (Pure)

Abstract

Federated learning (FL) systems enable multiple clients to train a machine learning model iteratively through synchronously exchanging the intermediate model weights with a single server. The scalability of such FL systems can be limited by two factors: server idle time due to synchronous communication and the risk of a single server becoming the bottleneck. In this paper, we propose a new FL architecture, Spyker, the first multi-server FL system that is entirely asynchronous, and therefore addresses these two limitations simultaneously. Spyker keeps both servers and clients continuously active. As in previous multi-server methods, clients interact solely with their nearest server, ensuring efficient update integration into the model. Differently, however, servers also periodically update each other asynchronously, and never postpone interactions with clients. We compare Spyker to three representative baselines - FedAvg, FedAsync and HierFAVG - on the MNIST and CIFAR-10 image classification datasets and on the WikiText-2 language modeling dataset. Spyker converges to similar or higher accuracy levels than previous baselines and requires 61% less time to do so in geo-distributed settings.

Original languageEnglish
Title of host publicationMiddleware 2024 - Proceedings of the 25th ACM International Middleware Conference
PublisherACM
Pages367-378
Number of pages12
ISBN (Electronic)9798400706233
DOIs
Publication statusPublished - 2024
Event25th ACM International Middleware Conference, Middleware 2024 - Hong Kong, Hong Kong
Duration: 2 Dec 20246 Dec 2024

Publication series

NameMiddleware 2024 - Proceedings of the 25th ACM International Middleware Conference

Conference

Conference25th ACM International Middleware Conference, Middleware 2024
Country/TerritoryHong Kong
CityHong Kong
Period2/12/246/12/24

Keywords

  • Asynchronous Learning
  • Byzantine Learning
  • Resource Heterogeneity

Fingerprint

Dive into the research topics of 'Spyker: Asynchronous Multi-Server Federated Learning for Geo-Distributed Clients'. Together they form a unique fingerprint.

Cite this