Diagnosing missing events in distributed systems with negative provenance

SIGCOMM(2014)

引用 94|浏览144
暂无评分
摘要
When debugging a distributed system, it is sometimes necessary to explain the absence of an event - for instance, why a certain route is not available, or why a certain packet did not arrive. Existing debuggers offer some support for explaining the presence of events, usually by providing the equivalent of a backtrace in conventional debuggers, but they are not very good at answering 'Why not?' questions: there is simply no starting point for a possible backtrace. In this paper, we show that the concept of negative provenance can be used to explain the absence of events in distributed systems. Negative provenance relies on counterfactual reasoning to identify the conditions under which the missing event could have occurred. We define a formal model of negative provenance for distributed systems, and we present the design of a system called Y! that tracks both positive and negative provenance and can use them to answer diagnostic queries. We describe how we have used Y! to debug several realistic problems in two application domains: software-defined networks and BGP interdomain routing. Results from our experimental evaluation show that the overhead of Y! is moderate.
更多
查看译文
关键词
debugging,diagnostics,network management,provenance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要