NeuroLPM - Scaling Longest Prefix Match Hardware with Neural Networks

Alon Rashelbach, Igor de Paula,Mark Silberstein

56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023（2023）

引用 0|浏览4

暂无评分

摘要

Longest Prefix Match engines (LPM) are broadly used in computer systems and especially in modern network devices such as Network Interface Cards (NICs), switches and routers. However, existing LPM hardware fails to scale to millions of rules required by modern systems, is often optimized for specific applications, and thus is performance-sensitive to the structure of LPM rules. We describe NeuroLPM, a new architecture for multi-purpose LPM hardware that replaces queries in traditional memory-intensive trie- and hash-table data structures with inference in a lightweight Neural Network-based model, called RQRMI. NeuroLPM scales to millions of rules under small on-die SRAM budget and achieves stable, rule-structure-agnostic performance, allowing its use in a variety of applications. We solve several unique challenges when implementing RQRMI inference in hardware, including minimizing the amount of floating point computations while maintaining query correctness, and scaling the rule-set size while ensuring small, deterministic off-chip memory bandwidth. We prototype NeuroLPM in Verilog and evaluate it on real-world packet forwarding rule-sets and network traces. NeuroLPM offers substantial scalability benefits without any application-specific optimizations. For example, it is the only algorithm that can serve a 950K-large rule-set at an average of 196M queries per second with 4.5MB of SRAM, only within 2% of the best-case throughput of the state-of-the-art Tree Bitmap and SAIL on smaller rule-sets. With 2MB of SRAM, it reduces the DRAM bandwidth per query, the dominant performance factor, by up to 9x and 3x compared to the state-of-the-art.

查看译文

关键词

Longest Prefix Matching,Throughput,Data Structure,Hash Function,Network Routing,Structural Rules,Memory Bandwidth,Packet Forwarding,Off-chip Memory,Prediction Error,Training Time,Set Of Rules,Per Cycle,State Machine,Load Balancing,Caching,Piecewise Linear Function,Neural Net,Binary Search,Hundreds Of Milliseconds,Memory Bank,Matching Rule,Cache Hit,Internal Data Structure,Multiple Queries,On-chip Memory,Number Of Memory,Input Domain,Cache Size,Critical Path

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要