With the massive amount of data generated from mobile devices and the increase of computing power of edge devices, the paradigm of Federated Learning has attracted great momentum. In federated learning, distributed and heterogeneous nodes collaborate to learn model parameters. However, while providing benefits such as privacy by design and reduced latency, the heterogeneous network present challenges to the synchronisation methods, or barrier control methods, used in training, regarding system progress and model convergence etc. The design of these barrier mechanisms is critical for the performance and scalability of federated learning systems. We propose a new barrier control technique called Probabilistic Synchronous Parallel (PSP). In contrast to existing mechanisms, it introduces a sampling primitive that composes with existing barrier control mechanisms to produce a family of mechanisms with improved convergence speed and scalability. Our proposal is supported with a convergence analysis of PSP-based SGD algorithm. In practice, we also propose heuristic techniques that further improve the efficiency of PSP. We evaluate the performance of proposed methods using the federated learning specific FEMNSIT dataset. The evaluation results show that PSP can effectively achieve good balance between system efficiency and model accuracy, mitigating the challenge of heterogeneity in federated learning.
Bibliographical noteGreen Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
- Federated learning
- edge computing
- distributed computing
- barrier control