MemA: Fast Inference of Multiple Deep Models

Jeroen Galjaard, Bart Cox, Amirmasoud Ghiassi, Lydia Y. Chen, Robert Birke

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

1 Citation (Scopus)

Abstract

The execution of deep neural network (DNN) inference jobs on edge devices has become increasingly popular. Multiple of such inference models can concurrently analyse the on-device data, e.g. images, to extract valuable insights. Prior art focuses on low-power accelerators, compressed neural network architectures, and specialized frameworks to reduce execution time of single inference jobs on edge devices which are resource constrained. However, it is little known how different scheduling policies can further improve the runtime performance of multi-inference jobs without additional edge resources. To enable the exploration of scheduling policies, we first develop an execution framework, EdgeCaffe, which splits the DNN inference jobs by loading and execution of each network layer. We empirically characterize the impact of loading and scheduling policies on the execution time of multi-inference jobs and point out their dependency on the available memory space. We propose a novel memory-aware scheduling policy, MemA, which opportunistically interleaves the executions of different types of DNN layers based on their estimated run-time memory demands. Our evaluation on exhaustive combinations of five networks, data inputs, and memory configurations show that MemA can alleviate the degradation of execution times of multi-inference (up to 5*) under severely constrained memory compared to standard scheduling policies without affecting accuracy.

Original languageEnglish
Title of host publication2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events, PerCom Workshops 2021
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages281-286
Number of pages6
ISBN (Electronic)9781665404242
DOIs
Publication statusPublished - 2021
Event2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events, PerCom Workshops 2021 - Kassel, Germany
Duration: 22 Mar 202126 Mar 2021

Publication series

Name2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events, PerCom Workshops 2021

Conference

Conference2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events, PerCom Workshops 2021
Country/TerritoryGermany
CityKassel
Period22/03/2126/03/21

Keywords

  • Constrained memory
  • Deep neural networks
  • Edge computing
  • Memory aware
  • Multi-inference
  • Scheduling

Fingerprint

Dive into the research topics of 'MemA: Fast Inference of Multiple Deep Models'. Together they form a unique fingerprint.

Cite this