Hardware Implementation of Accumulated Value Calculation for Two-Dimensional Continuous Dynamic Programming

Embedded Multicore Socs(2012)

引用 1|浏览0
暂无评分
摘要
We propose an efficient hardware accelerator for the calculation of accumulated values of two-dimensional continuous dynamic programming (2DCDP). The 2DCDP is a powerful optimal pixel-matching algorithm between input and reference images which can be applied to image processing, such as image recognition, image search, feature tracking, 3D reconstruction, and so on. However, it requires large computation time due to its time and space complexities of O(N^4 ). We analyze the computation flow of the 2DCDP algorithm and propose a high-performance architecture for a hardware accelerator. Parallelized accumulated minimum local distance calculators and a toggle memory structure are newly introduced to reduce the computation cost and memory. The proposed architecture is implemented into an FPGA, Stratix IV, EP4SE820. Its maximum operation frequency is 125.71 MHz. The preliminary evaluation reveals that the parallel processing by 32 PEs for the accumulated value calculation for 32x32 input and reference images can be sped up to 77 times at the maximum operation frequency of 100 MHz compared to the processing with a multi-core processor.
更多
查看译文
关键词
maximum operation frequency,hardware implementation,image search,reference image,computation cost,large computation time,computation flow,accumulated value calculation,image processing,image recognition,parallel processing,two-dimensional continuous dynamic programming,efficient hardware accelerator,dynamic programming,pattern matching,hardware,image segmentation,multicore processing,field programmable gate arrays,computational complexity,fpga
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要