Back to Search View Original Cite This Article

Abstract

<jats:p>The benefits of Deep Learning have escalated exponentially in recent years, pushing its adoption from cloud and high-end servers to the edge of distributed infrastructures. On the other hand, the tight energy budgets and reduced power envelops of edge and IoT devices introduce new challenges for efficient, yet compute-intensive inferences. Additionally, modern edge-computing platforms require multi-tasking and multi-tenancy support to cope with the parallel nature of near-data edge-AI workloads. Among edge-class platforms available for inference acceleration, FPGA technology showcases as a flexible and energy-efficient solution, due to its low-power footprint and field-adaptive capabilities. Unfortunately, deploying Deep Learning models on state-of-the-art edge-class FPGA platforms falls short of offering low-level accelerator control or multi-tenancy support, resulting in significant energy waste, a cost that edge platforms cannot afford. In this work, we introduce a compute-efficient runtime and energy evaluation framework for heterogeneous multi-NPU architectures on edge-class MPSoCs. Our framework is built upon a light-weight, hardware-calibrated and measure-centric analytical model capturing NPU inference runtime and power draw. We leverage such a compute-efficient model to design and deploy a set of online on-device allocation policies performing energy-aware decisions for multi-tenant DNN inference workloads. We systematically evaluate our multi-NPU energy model and allocation policies against several multi-tenant workloads based on the MLPerf benchmark. Our validation shows our runtime- and energy-aware multi-tenant allocation policies surpassing the baseline hardware scheduling framework and related work. Finally, we discuss tradeoffs and limitations of our approach and validate its accuracy by accurate and non-invasive power consumption measurements on the target platform’s power rails. We reach 92 % and 85 % average accuracy with respect from online measures of runtime and energy consumption respectively for the demonstrated workloads on the target multi-NPU platform.</jats:p>

Show More

Keywords

energy platforms power workloads runtime

Related Articles