Optimizing Halide for Digital Signal Processors

2020 IEEE Workshop on Signal Processing Systems (SiPS)(2020)

引用 0|浏览1
暂无评分
摘要
Halide, a Domain Specific Language (DSL) for image and array processing, promotes the separation of functional algorithm from execution schedule, making it easier for the user to optimize the code for different hardware platforms. Halide supports multiple back-end APIs, including CUDA and OpenCL for GPUs. Although many modern Digital Signal Processors (DSPs) support OpenCL, achieving high performance on these devices require the use of features beyond those required to support GPUs. Without those features, there is a strict limit to the effectiveness of targeting a DSP through the Halide OpenCL back-end. In this paper, we describe a set of Halide extensions and optimizations required to effectively support a DSP target, including DMA Promotion, Type Width Reduction and Intrinsic generation. We evaluate the effects of our optimization on Cadence Vision DSP and report the results. On an average, we observe 88X speedup over the baseline generated OpenCL code, and the performance is comparable to handcrafted OpenCL for the target platform.
更多
查看译文
关键词
Optimization,Graphics processing units,Hardware,Schedules,Performance evaluation,Productivity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要