TY - GEN
T1 - Energy-Aware Vision Model Partitioning for Edge AI
AU - Katare, Dewant
AU - Zhou, Mengying
AU - Chen, Yang
AU - Janssen, Marijn
AU - Ding, Aaron Yi
PY - 2025
Y1 - 2025
N2 - Deploying scalable Vision Transformer applications on mobile and edge devices is constrained by limited memory and computational resources. Existing model development and deployment strategies include distributed computing and inference methods such as federated learning, split computing, collaborative inference and edge-cloud offloading mechanisms. While these strategies have deployment advantages, they fail to optimize memory usage and processing efficiency, resulting in increased energy consumption. This paper optimizes energy consumption by introducing adaptive model partitioning mechanisms and dynamic scaling methods for ViTs such as EfficientViT and TinyViT, adjusting model complexity based on the available computational resources and operating conditions. We implement energy-efficient strategies that minimize inter-layer communication for distributed machine learning across edge devices, thereby reducing energy consumption from data flow and computation. Our evaluations on a series of benchmark models show improvements, including up to a 32.6% reduction in latency and 16.6% energy savings, while maintaining mean average precision sacrifices within 2.5 to 4.5% of baseline models. These results show that our proposal is a practical approach for improving edge AI sustainability and efficiency.
AB - Deploying scalable Vision Transformer applications on mobile and edge devices is constrained by limited memory and computational resources. Existing model development and deployment strategies include distributed computing and inference methods such as federated learning, split computing, collaborative inference and edge-cloud offloading mechanisms. While these strategies have deployment advantages, they fail to optimize memory usage and processing efficiency, resulting in increased energy consumption. This paper optimizes energy consumption by introducing adaptive model partitioning mechanisms and dynamic scaling methods for ViTs such as EfficientViT and TinyViT, adjusting model complexity based on the available computational resources and operating conditions. We implement energy-efficient strategies that minimize inter-layer communication for distributed machine learning across edge devices, thereby reducing energy consumption from data flow and computation. Our evaluations on a series of benchmark models show improvements, including up to a 32.6% reduction in latency and 16.6% energy savings, while maintaining mean average precision sacrifices within 2.5 to 4.5% of baseline models. These results show that our proposal is a practical approach for improving edge AI sustainability and efficiency.
KW - edge computing
KW - energy-aware computing
KW - model partition
UR - http://www.scopus.com/inward/record.url?scp=105006440758&partnerID=8YFLogxK
U2 - 10.1145/3672608.3707792
DO - 10.1145/3672608.3707792
M3 - Conference contribution
AN - SCOPUS:105006440758
T3 - Proceedings of the ACM Symposium on Applied Computing
SP - 671
EP - 678
BT - 40th Annual ACM Symposium on Applied Computing, SAC 2025
PB - ACM
T2 - 40th Annual ACM Symposium on Applied Computing, SAC 2025
Y2 - 31 March 2025 through 4 April 2025
ER -