Abstract
Split learning (SL) has been recently proposed as a way to enable resource-constrained devices to train multi-parameter neural networks (NNs) and participate in federated learning (FL). In a nutshell, SL splits the NN model into parts, and allows clients (devices) to offload the largest part as a processing task to a computationally powerful helper. In parallel SL, multiple helpers can process model parts of one or more clients, thus, considerably reducing the maximum training time over all clients (makespan). In this paper, we focus on orchestrating the workflow of this operation, which is critical in highly heterogeneous systems, as our experiments show. In particular, we formulate the joint problem of client-helper assignments and scheduling decisions with the goal of minimizing the training makespan, and we prove that it is NPhard. We propose a solution method based on the decomposition of the problem by leveraging its inherent symmetry, and a second one that is fully scalable. A wealth of numerical evaluations using our testbed’s measurements allow us to build a solution strategy comprising these methods. Moreover, we show that this strategy finds a near-optimal solution, and achieves a shorter makespan than the baseline scheme by up to 52.3%.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the IEEE INFOCOM 2024 - IEEE Conference on Computer Communications |
| Publisher | IEEE |
| Pages | 1331-1340 |
| Number of pages | 10 |
| ISBN (Electronic) | 979-8-3503-8350-8 |
| ISBN (Print) | 979-8-3503-8351-5 |
| DOIs | |
| Publication status | Published - 2024 |
| Event | IEEE INFOCOM 2024 - IEEE Conference on Computer Communications - Vancouver, Canada Duration: 20 May 2024 → 23 May 2024 |
Conference
| Conference | IEEE INFOCOM 2024 - IEEE Conference on Computer Communications |
|---|---|
| Country/Territory | Canada |
| City | Vancouver |
| Period | 20/05/24 → 23/05/24 |
Bibliographical note
Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-careOtherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
Keywords
- Training
- Federated learning
- Computational modeling
- Artificial neural networks
- Task analysis
- Optimization