Architectural Implications of Neural Network Inference for High Data-Rate, Low-Latency Scientific Applications
CoRR(2024)
摘要
With more scientific fields relying on neural networks (NNs) to process data
incoming at extreme throughputs and latencies, it is crucial to develop NNs
with all their parameters stored on-chip. In many of these applications, there
is not enough time to go off-chip and retrieve weights. Even more so, off-chip
memory such as DRAM does not have the bandwidth required to process these NNs
as fast as the data is being produced (e.g., every 25 ns). As such, these
extreme latency and bandwidth requirements have architectural implications for
the hardware intended to run these NNs: 1) all NN parameters must fit on-chip,
and 2) codesigning custom/reconfigurable logic is often required to meet these
latency and bandwidth constraints. In our work, we show that many scientific NN
applications must run fully on chip, in the extreme case requiring a custom
chip to meet such stringent constraints.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要