Bridging Architecture and Programming for Throughput-Oriented Vision Processing (Abstract Only).

FPGA '15: The 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Monterey California USA February, 2015(2015)

引用 0|浏览42
暂无评分
摘要
With the expansion of OpenCL support across many heterogeneous devices (including FPGAs, GPUs and CPUs), the programmability of these systems has been significantly increased. At the same time, new questions arise about which device should be targeted for each OpenCL software kernel. Once we select a device, then we are left to customize the application, selecting the right granularity of parallelism and frequency of host-to-device communication. In this paper, we study the impact of source-level decisions on the overall execution time when developing OpenCL program across different heterogeneous devices. We focus on two mainstream architecture classes (GPUs and FPGAs), and consider throughput-oriented advanced vision processing. To guide this exploration, we propose a new vertical classification for selecting the grain of parallelism for advanced vision processing applications. To carry out this study we have selected the Mean-shift object tracking algorithm as a representative candidate of advanced vision algorithms. Overall, our evaluation demonstrates that fine-grained parallelism can greatly benefit FPGA execution (up to a 4X speed-up), while a combination of coarse-grained and fine-grained parallelism achieves the best performance on a GPU (up to a 6X speed-up). Also, there can be a large benefit if we can execute both the parallel and serial parts of the program on a FPGA (up to a 21X speed-up).
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要