Accelerating Gossip-Based Deep Learning in Heterogeneous Edge Computing Platforms

Rui Han; Shilin Li; Xiangwei Wang; Chi Harold Liu; Gaofeng Xin; Lydia Y. Chen

doi:10.1109/TPDS.2020.3046440

Accelerating Gossip-Based Deep Learning in Heterogeneous Edge Computing Platforms

Rui Han, Shilin Li, Xiangwei Wang, Chi Harold Liu^*, Gaofeng Xin, Lydia Y. Chen

^*Corresponding author for this work

Data-Intensive Systems

Research output: Contribution to journal › Article › Scientific › peer-review

14 Citations (Scopus)

Abstract

With the exponential growth of data created at the network edge, decentralized and Gossip-based training of deep learning (DL) models on edge computing (EC) gains tremendous research momentum, owing to its capability to learn from resource-strenuous edge nodes with limited network connectivity. Today's edge devices are extremely heterogeneous, e.g., hardware and software stacks, and result in high performance variation of training time and inducing extra delay to synchronize and converge. The large body of prior art accelerates DL, being data or model parallelization, via a centralized server, e.g., parameter server scheme, which may easily turn into the system bottleneck or single point of failure. In this artice, we propose EdgeGossip, a framework specifically designed to accelerate the training process of decentralized and Gossip-based DL training for heterogeneous EC platforms. EdgeGossip features on: (i) low performance variation among multiple EC platforms during iterative training, and (ii) accuracy-aware training to fastly obtain best possible model accuracy. We implement EdgeGossip based on popular Gossip algorithms and demonstrate its effectiveness using real-world DL workloads, i.e., considerably reducing model training time by an average of 2.70 times while only incurring accuracy losses of 0.78 percent.

Original language	English
Article number	9303468
Pages (from-to)	1591-1602
Number of pages	12
Journal	IEEE Transactions on Parallel and Distributed Systems
Volume	32
Issue number	7
DOIs	https://doi.org/10.1109/TPDS.2020.3046440
Publication status	Published - 2021

Keywords

decentralized training
Deep learning
edge computing
gossip

Access to Document

10.1109/TPDS.2020.3046440

Cite this

@article{e94cdf757a0c47378ebd90e7049535b0,

title = "Accelerating Gossip-Based Deep Learning in Heterogeneous Edge Computing Platforms",

abstract = "With the exponential growth of data created at the network edge, decentralized and Gossip-based training of deep learning (DL) models on edge computing (EC) gains tremendous research momentum, owing to its capability to learn from resource-strenuous edge nodes with limited network connectivity. Today's edge devices are extremely heterogeneous, e.g., hardware and software stacks, and result in high performance variation of training time and inducing extra delay to synchronize and converge. The large body of prior art accelerates DL, being data or model parallelization, via a centralized server, e.g., parameter server scheme, which may easily turn into the system bottleneck or single point of failure. In this artice, we propose EdgeGossip, a framework specifically designed to accelerate the training process of decentralized and Gossip-based DL training for heterogeneous EC platforms. EdgeGossip features on: (i) low performance variation among multiple EC platforms during iterative training, and (ii) accuracy-aware training to fastly obtain best possible model accuracy. We implement EdgeGossip based on popular Gossip algorithms and demonstrate its effectiveness using real-world DL workloads, i.e., considerably reducing model training time by an average of 2.70 times while only incurring accuracy losses of 0.78 percent. ",

keywords = "decentralized training, Deep learning, edge computing, gossip",

author = "Rui Han and Shilin Li and Xiangwei Wang and Liu, {Chi Harold} and Gaofeng Xin and Chen, {Lydia Y.}",

year = "2021",

doi = "10.1109/TPDS.2020.3046440",

language = "English",

volume = "32",

pages = "1591--1602",

journal = "IEEE Transactions on Parallel and Distributed Systems",

issn = "1045-9219",

publisher = "IEEE",

number = "7",

}

TY - JOUR

T1 - Accelerating Gossip-Based Deep Learning in Heterogeneous Edge Computing Platforms

AU - Han, Rui

AU - Li, Shilin

AU - Wang, Xiangwei

AU - Liu, Chi Harold

AU - Xin, Gaofeng

AU - Chen, Lydia Y.

PY - 2021

Y1 - 2021

N2 - With the exponential growth of data created at the network edge, decentralized and Gossip-based training of deep learning (DL) models on edge computing (EC) gains tremendous research momentum, owing to its capability to learn from resource-strenuous edge nodes with limited network connectivity. Today's edge devices are extremely heterogeneous, e.g., hardware and software stacks, and result in high performance variation of training time and inducing extra delay to synchronize and converge. The large body of prior art accelerates DL, being data or model parallelization, via a centralized server, e.g., parameter server scheme, which may easily turn into the system bottleneck or single point of failure. In this artice, we propose EdgeGossip, a framework specifically designed to accelerate the training process of decentralized and Gossip-based DL training for heterogeneous EC platforms. EdgeGossip features on: (i) low performance variation among multiple EC platforms during iterative training, and (ii) accuracy-aware training to fastly obtain best possible model accuracy. We implement EdgeGossip based on popular Gossip algorithms and demonstrate its effectiveness using real-world DL workloads, i.e., considerably reducing model training time by an average of 2.70 times while only incurring accuracy losses of 0.78 percent.

AB - With the exponential growth of data created at the network edge, decentralized and Gossip-based training of deep learning (DL) models on edge computing (EC) gains tremendous research momentum, owing to its capability to learn from resource-strenuous edge nodes with limited network connectivity. Today's edge devices are extremely heterogeneous, e.g., hardware and software stacks, and result in high performance variation of training time and inducing extra delay to synchronize and converge. The large body of prior art accelerates DL, being data or model parallelization, via a centralized server, e.g., parameter server scheme, which may easily turn into the system bottleneck or single point of failure. In this artice, we propose EdgeGossip, a framework specifically designed to accelerate the training process of decentralized and Gossip-based DL training for heterogeneous EC platforms. EdgeGossip features on: (i) low performance variation among multiple EC platforms during iterative training, and (ii) accuracy-aware training to fastly obtain best possible model accuracy. We implement EdgeGossip based on popular Gossip algorithms and demonstrate its effectiveness using real-world DL workloads, i.e., considerably reducing model training time by an average of 2.70 times while only incurring accuracy losses of 0.78 percent.

KW - decentralized training

KW - Deep learning

KW - edge computing

KW - gossip

UR - http://www.scopus.com/inward/record.url?scp=85098757067&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2020.3046440

DO - 10.1109/TPDS.2020.3046440

M3 - Article

AN - SCOPUS:85098757067

SN - 1045-9219

VL - 32

SP - 1591

EP - 1602

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

IS - 7

M1 - 9303468

ER -

Accelerating Gossip-Based Deep Learning in Heterogeneous Edge Computing Platforms

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this