Toward Full-Coverage and Low-Overhead Profiling of Network-Stack Latency

Xiang Chen,Hongyan Liu, Wenbin Zhang,Qun Huang,Dong Zhang,Haifeng Zhou,Xuan Liu,Chunming Wu

IEEE/ACM Transactions on Networking（2024）

Cited 0|Views1

No score

Abstract

In modern data center networks (DCNs), network-stack processing denotes a large portion of the end-to-end latency of TCP flows. So profiling network-stack latency anomalies has been considered as a crucial part in DCN performance diagnosis and troubleshooting. In particular, such profiling requires full coverage (i.e., profiling every TCP packet) and low overhead (i.e., profiling should avoid high CPU consumption in end-hosts). However, existing solutions rely on system calls or tracepoints in end-hosts to implement network-stack latency profiling, leading to either low coverage or high overhead. We propose Torp, a framework that offers full-coverage and low-overhead profiling of network-stack latency. Our key idea is to offload as much of the profiling from costly system calls or tracepoints to the Torp agent built on eBPF modules, and further to include a Torp handler on the ToR switch to accelerate the remaining profiling operations. Torp efficiently coordinates the ToR switch and the Torp agent on end-hosts to jointly execute the entire latency profiling task. We have implemented Torp on $32\times 100$ Gbps Tofino switches. Testbed experiments indicate that Torp achieves full coverage and orders of magnitude lower host-side overhead compared to other solutions.

Translated text

Key words

Hardware-software coordination,latency profiling,programmable switches

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined