Computing Utilization Enhancement for Chiplet-based Homogeneous Processing-in-Memory Deep Learning Processors

Great Lakes Symposium on VLSI(2021)

引用 2|浏览16
暂无评分
摘要
ABSTRACTThis paper presents a design strategy of chiplet-based processing-in-memory systems for deep neural network applications. Monolithic silicon chips are area and power limited, failing to catch the recent rapid growth of deep learning algorithms. The paper first demonstrates a straightforward layer-wise method that partitions the workload of a monolithic accelerator to a multi-chiplet pipeline. A quantitative analysis shows that the straightforward separation degrades the overall utilization of computing resources due to the reduced on-chiplet memory size, thus introducing a higher memory wall. A tile interleaving strategy is proposed to overcome such degradation. This strategy can segment one layer to different chiplets which maximizes the computing utilization. To facilitate the strategy, the modification of the chiplet system hardware is also discussed. To validate the proposed strategy, a nine-chiplet processing-in-memory system is evaluated with a custom-designed object detection network. Each chiplet can achieve a peak performance of 204.8GOPS at a 100-MHz rate. The peak performance of the overall system is 1.711TOPS, where no off-chip memory access is needed. By the tile interleaving strategy, the utilization is improved from 53.9 to 92.8
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要