Machine Learning Performance at the Edge: When to Offload an Inference Task.

Olga Chukhno, Gurtaj Singh,Claudia Campolo,Antonella Molinaro, Carla-Fabiana Chiasserini

NET4us@MobiCom（2023）

引用 0|浏览0

暂无评分

摘要

Machine Learning (ML) techniques play a crucial role in extracting valuable insights from the large amounts of data massively collected through networked sensing systems. Given the increased capabilities of user devices and the growing demand for inference in mobile sensing applications, we are witnessing a paradigm shift where inference is executed at the end devices instead of burdening the network and cloud infrastructures. This paper investigates the performance of inference execution at the network edge and at end-devices, when using both a full and a pruned model. While pruning reduces model size, thus making the model amenable for execution at an end-device and decreasing communication footprint, trade-offs in time complexity, potential accuracy loss, and energy consumption must be accounted for. We tackle such trade-offs through extensive experiments under various ML models, edge load conditions, and pruning factors. Our results show that executing a pruned model provides time and energy (on the device side) savings up to 40% and 53%, respectively, w.r.t. the full model. Also, executing inference at the end-device may lead to 60% faster decision-making compared to inference execution at a highly loaded edge.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要