TY - GEN
T1 - Masa
T2 - 19th IEEE International Conference on Pervasive Computing and Communications, PerCom 2021
AU - Cox, Bart
AU - Galjaard, Jeroen
AU - Ghiassi, Amirmasoud
AU - Birke, Robert
AU - Chen, Lydia Y.
PY - 2021
Y1 - 2021
N2 - Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted via executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. The response times of multi-DNN highly affect users' quality of experience and safety as well. Different DNNs exhibit diversified resource requirements and execution patterns across layers and networks, which may easily exceed the available device memory and riskily degrade the responsiveness. In this paper, we design and implement Masa, a responsive memory-aware multi-DNN execution framework, an on-device middleware featuring on modeling inter- and intra-network dependency and leveraging complimentary memory usage of each layer. Masa can consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. We extensively evaluate Masa on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that Masa can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.
AB - Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted via executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. The response times of multi-DNN highly affect users' quality of experience and safety as well. Different DNNs exhibit diversified resource requirements and execution patterns across layers and networks, which may easily exceed the available device memory and riskily degrade the responsiveness. In this paper, we design and implement Masa, a responsive memory-aware multi-DNN execution framework, an on-device middleware featuring on modeling inter- and intra-network dependency and leveraging complimentary memory usage of each layer. Masa can consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. We extensively evaluate Masa on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that Masa can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.
KW - average response time
KW - edge devices
KW - memory-aware scheduling
KW - Multiple DNNs inference
UR - http://www.scopus.com/inward/record.url?scp=85107560409&partnerID=8YFLogxK
U2 - 10.1109/PERCOM50583.2021.9439111
DO - 10.1109/PERCOM50583.2021.9439111
M3 - Conference contribution
AN - SCOPUS:85107560409
T3 - 2021 IEEE International Conference on Pervasive Computing and Communications, PerCom 2021
BT - 2021 IEEE International Conference on Pervasive Computing and Communications, PerCom 2021
PB - Institute of Electrical and Electronics Engineers (IEEE)
Y2 - 22 March 2021 through 26 March 2021
ER -