EDGETUNE: Inference-Aware Multi-Parameter Tuning

Isabelly Rocha, Pascal Felber, Valerio Schiavoni, Lydia Chen

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

21 Downloads (Pure)

Abstract

Deep Neural Networks (DNNs) have demonstrated impressive performance on many machine-learning tasks such as image recognition and language modeling, and are becoming prevalent even on mobile platforms. Despite so, designing neural architectures still remains a manual, time-consuming process that requires profound domain knowledge. Recently, Parameter Tuning Servers have gathered the attention o industry and academia. Those systems allow users from all domains to automatically achieve the desired model accuracy for their applications. However, although the entire process of tuning and training models is performed solely to be deployed for inference, state-of-the-art approaches typically ignore system-oriented and inference-related objectives such as runtime, memory usage, and power consumption. This is a challenging problem: besides adding one more dimension to an already complex problem, the information about edge devices available to the user is rarely known or complete. To accommodate all these objectives together, it is crucial for tuning system to take a holistic approach to parameter tuning and consider all levels of parameters simultaneously into account. We present EdgeTune, a novel inference-aware parameter tuning server. It considers the tuning of parameters in all levels backed by an optimization function capturing multiple objectives. Our approach relies on inference estimated metrics collected from our emulation server running asynchronously from the main tuning process. The latter can then leverage the inference performance while still tuning the model. We propose a novel one-fold tuning algorithm that employs the principle of multi-fidelity and simultaneously explores multiple tuning budgets, which the prior art can only handle as suboptimal case of single type of budget. EdgeTune outputs inference recommendations to the user while improving tuning time and energy by at least 18\% and 53\% when compared to the baseline.

Original languageEnglish
Title of host publicationMiddleware 2022 - Proceedings of the 23rd ACM/IFIP International Middleware Conference
PublisherAssociation for Computing Machinery (ACM)
Pages1-14
Number of pages14
ISBN (Electronic)9781450393409
DOIs
Publication statusPublished - 2022
Event23rd ACM/IFIP International Middleware Conference, Middleware 2022 - Quebec, Canada
Duration: 7 Nov 202211 Nov 2022

Publication series

NameMiddleware 2022 - Proceedings of the 23rd ACM/IFIP International Middleware Conference

Conference

Conference23rd ACM/IFIP International Middleware Conference, Middleware 2022
Country/TerritoryCanada
CityQuebec
Period7/11/2211/11/22

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • deep neural networks
  • inference
  • training
  • tuning

Fingerprint

Dive into the research topics of 'EDGETUNE: Inference-Aware Multi-Parameter Tuning'. Together they form a unique fingerprint.

Cite this