Memory-aware and context-aware multi-DNN inference on the edge

Bart Cox; Robert Birke; Lydia Y. Chen

doi:10.1016/j.pmcj.2022.101594

Memory-aware and context-aware multi-DNN inference on the edge

Bart Cox^*, Robert Birke, Lydia Y. Chen

^*Corresponding author for this work

Data-Intensive Systems

Research output: Contribution to journal › Article › Scientific › peer-review

2 Citations (Scopus)

32 Downloads (Pure)

Abstract

Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted by executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. It is of paramount importance to guarantee low response times of such multi-DNN executions as it affects not only users quality of experience but also safety. The challenge, largely unaddressed by the state of the art, is how to overcome the memory limitation of edge devices without altering the DNN models. In this paper, we design and implement MASA, a responsive memory-aware multi-DNN execution and scheduling framework, which requires no modification of DNN models. The aim of MASA is to consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. The enabling features of MASA are (i) modeling inter- and intra-network dependency, (ii) leveraging complimentary memory usage of each layer, and (iii) exploring the context dependency of DNNs. We verify the correctness and scheduling optimality via mixed integer programming. We extensively evaluate two versions of MASA, context-oblivious and context-aware, on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that MASA can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.

Original language	English
Article number	101594
Number of pages	16
Journal	Pervasive and Mobile Computing
Volume	83
DOIs	https://doi.org/10.1016/j.pmcj.2022.101594
Publication status	Published - 2022

Keywords

Average response time
Edge devices
Memory-aware scheduling
Multiple DNNs inference

Access to Document

10.1016/j.pmcj.2022.101594

1-s2.0-S1574119222000372-mainFinal published version, 991 KBLicence: CC BY

Cite this

@article{49b61676047442d3921b06e7b7440466,

title = "Memory-aware and context-aware multi-DNN inference on the edge",

abstract = "Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted by executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. It is of paramount importance to guarantee low response times of such multi-DNN executions as it affects not only users quality of experience but also safety. The challenge, largely unaddressed by the state of the art, is how to overcome the memory limitation of edge devices without altering the DNN models. In this paper, we design and implement MASA, a responsive memory-aware multi-DNN execution and scheduling framework, which requires no modification of DNN models. The aim of MASA is to consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. The enabling features of MASA are (i) modeling inter- and intra-network dependency, (ii) leveraging complimentary memory usage of each layer, and (iii) exploring the context dependency of DNNs. We verify the correctness and scheduling optimality via mixed integer programming. We extensively evaluate two versions of MASA, context-oblivious and context-aware, on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that MASA can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.",

keywords = "Average response time, Edge devices, Memory-aware scheduling, Multiple DNNs inference",

author = "Bart Cox and Robert Birke and Chen, {Lydia Y.}",

year = "2022",

doi = "10.1016/j.pmcj.2022.101594",

language = "English",

volume = "83",

journal = "Pervasive and Mobile Computing",

issn = "1574-1192",

publisher = "Elsevier",

}

TY - JOUR

T1 - Memory-aware and context-aware multi-DNN inference on the edge

AU - Cox, Bart

AU - Birke, Robert

AU - Chen, Lydia Y.

PY - 2022

Y1 - 2022

N2 - Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted by executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. It is of paramount importance to guarantee low response times of such multi-DNN executions as it affects not only users quality of experience but also safety. The challenge, largely unaddressed by the state of the art, is how to overcome the memory limitation of edge devices without altering the DNN models. In this paper, we design and implement MASA, a responsive memory-aware multi-DNN execution and scheduling framework, which requires no modification of DNN models. The aim of MASA is to consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. The enabling features of MASA are (i) modeling inter- and intra-network dependency, (ii) leveraging complimentary memory usage of each layer, and (iii) exploring the context dependency of DNNs. We verify the correctness and scheduling optimality via mixed integer programming. We extensively evaluate two versions of MASA, context-oblivious and context-aware, on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that MASA can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.

AB - Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted by executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. It is of paramount importance to guarantee low response times of such multi-DNN executions as it affects not only users quality of experience but also safety. The challenge, largely unaddressed by the state of the art, is how to overcome the memory limitation of edge devices without altering the DNN models. In this paper, we design and implement MASA, a responsive memory-aware multi-DNN execution and scheduling framework, which requires no modification of DNN models. The aim of MASA is to consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. The enabling features of MASA are (i) modeling inter- and intra-network dependency, (ii) leveraging complimentary memory usage of each layer, and (iii) exploring the context dependency of DNNs. We verify the correctness and scheduling optimality via mixed integer programming. We extensively evaluate two versions of MASA, context-oblivious and context-aware, on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that MASA can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.

KW - Average response time

KW - Edge devices

KW - Memory-aware scheduling

KW - Multiple DNNs inference

UR - http://www.scopus.com/inward/record.url?scp=85129515528&partnerID=8YFLogxK

U2 - 10.1016/j.pmcj.2022.101594

DO - 10.1016/j.pmcj.2022.101594

M3 - Article

AN - SCOPUS:85129515528

SN - 1574-1192

VL - 83

JO - Pervasive and Mobile Computing

JF - Pervasive and Mobile Computing

M1 - 101594

ER -

Memory-aware and context-aware multi-DNN inference on the edge

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this