OpTIFlow - An optimized end-to-end dataflow for accelerating deep learning workloads on heterogeneous SoCs.

Shyam Jagannathan, Vijay Pothukuchi,Jesse Villarreal,Kumar Desappan,Manu Mathew, Rahul Ravikumar,Aniket Limaye,Mihir Mody,Pramod Swami,Piyali Goswami, Carlos Rodriguez, Emmanuel Madrigal, Marco Herrera

Autonomous Vehicles and Machines(2023)

引用 0|浏览2
暂无评分
摘要
A typical edge compute SoC capable of handling deep learning workloads at low power is usually heterogeneous by design. It typically comprises multiple initiators such as real-time IPs for capture and display, hardware accelerators for ISP, computer vision, deep learning engines, codecs, DSP or ARM cores for general compute, GPU for 2D/3D visualization. Every participating initiator transacts with common resources such as L3/L4/DDR memory systems to seamlessly exchange data between them. A careful orchestration of this dataflow is important to keep every producer/consumer at full utilization without causing any drop in real-time performance which is critical for automotive applications. The software stack for such complex workflows can be quite intimidating for customers to bring-up and more often act as an entry barrier for many to even evaluate the device for performance. In this paper we propose techniques developed on TI’s latest TDA4V-Mid SoC, targeted for ADAS and autonomous applications, which is designed around ease-of-use but ensuring device entitlement class of performance using open standards such as DL runtimes, OpenVx and GStreamer.
更多
查看译文
关键词
deep learning workloads,deep learning,end-to-end
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要