Compute-Efficient Modelling of Multi-NPU Inference on Edge MPSoCs for Energy-Aware Online Workload Allocation

Authors: Vincenzo Maisto, Alessandro Cilardo

Publication: ACM Transactions on Architecture and Code Optimization

Published: Mar 31, 2026

Source: Crossref

Back to Search View Original Cite This Article

Abstract

<jats:p>The benefits of Deep Learning have escalated exponentially in recent years, pushing its adoption from cloud and high-end servers to the edge of distributed infrastructures. On the other hand, the tight energy budgets and reduced power envelops of edge and IoT devices introduce new challenges for efficient, yet compute-intensive inferences. Additionally, modern edge-computing platforms require multi-tasking and multi-tenancy support to cope with the parallel nature of near-data edge-AI workloads. Among edge-class platforms available for inference acceleration, FPGA technology showcases as a flexible and energy-efficient solution, due to its low-power footprint and field-adaptive capabilities. Unfortunately, deploying Deep Learning models on state-of-the-art edge-class FPGA platforms falls short of offering low-level accelerator control or multi-tenancy support, resulting in significant energy waste, a cost that edge platforms cannot afford. In this work, we introduce a compute-efficient runtime and energy evaluation framework for heterogeneous multi-NPU architectures on edge-class MPSoCs. Our framework is built upon a light-weight, hardware-calibrated and measure-centric analytical model capturing NPU inference runtime and power draw. We leverage such a compute-efficient model to design and deploy a set of online on-device allocation policies performing energy-aware decisions for multi-tenant DNN inference workloads. We systematically evaluate our multi-NPU energy model and allocation policies against several multi-tenant workloads based on the MLPerf benchmark. Our validation shows our runtime- and energy-aware multi-tenant allocation policies surpassing the baseline hardware scheduling framework and related work. Finally, we discuss tradeoffs and limitations of our approach and validate its accuracy by accurate and non-invasive power consumption measurements on the target platform’s power rails. We reach 92 % and 85 % average accuracy with respect from online measures of runtime and energy consumption respectively for the demonstrated workloads on the target multi-NPU platform.</jats:p>

Keywords

energy platforms power workloads runtime

Compute-Efficient Modelling of Multi-NPU Inference on Edge MPSoCs for Energy-Aware Online Workload Allocation

Abstract

Keywords

Related Articles

Investigation and Optimization of Air Pollution Risk by a Multi-criteria Decision Making Method Using Fuzzy TOPSIS: A Case Study of Construction Workers

EFFICIENT INDEXING METHODS IN THE DATA MINING CONTEXT

Analysis of building energy efficiency optimization design effectiveness based on multi-objective optimization algorithm

Efficient Decoding from Heterogeneous 1-Bit Compressive Measurements over Networks

Temporal dynamics in stock market modelling: a comparative analysis across varied data sizes