An Associativity-Agnostic in-Cache Computing Architecture Optimized for Multiplication

2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC)(2019)

引用 9|浏览7
暂无评分
摘要
With the spread of cloud services and Internet of Things concept, there is a popularization of machine learning and artificial intelligence based analytics in our everyday life. However, an efficient deployment of these data-intensive services requires performing computations closer to the edge. In this context, in-cache computing, based on bitline computing, is promising to execute data-intensive algorithms in an energy efficient way by mitigating data movement in the cache hierarchy and exploiting data parallelism. Nevertheless, previous in-cache computing architectures contain serious circuit-level deficiencies (i.e., low bitcell density, data corruption risks, and limited performance), thus report high multiplication latency, which is a key operation for machine learning and deep learning. Moreover, no previous work addresses the issue of way misalignment, strongly constraining data placement not to reduce performance gains. In this work we drastically improve the previously proposed BLADE architecture for in-cache computing to efficiently support multiplication operations by enhancing the local bitline circuitry, enabling associativity-agnostic operations as well as in-place shifting inside local bitline groups. We implemented and simulated the proposed architecture in CMOS 28nm bulk technology from TSMC, validating its functionality and extracting its performance, area, and energy per operation. Then, we designed a behavioral model of the proposed architecture to assess its performance with respect to the latest BLADE architecture. We show a 17.5 and 22% area and energy reduction thanks to the proposed LG optimization. Finally, for 16bits multiplication, we demonstrate 44% cycle count, 47% energy and 41% performances gain versus BLADE and show that 4 embedded shifts is the best trade-off between energy, area and performances.
更多
查看译文
关键词
associativity-agnostic in-cache computing architecture,cloud services,Internet of Things,machine learning,artificial intelligence based analytics,data-intensive services,bitline computing,data-intensive algorithms,data movement,cache hierarchy,data parallelism,in-cache computing architectures,circuit-level deficiencies,low bitcell density,data corruption risks,deep learning,data placement,local bitline circuitry,local bitline groups,BLADE architecture,high multiplication latency,TSMC CMOS bulk technology,behavioral model,performances gain,LG optimization,size 28.0 nm,word length 16.0 bit
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要