Database-Backed Program Analysis for Finding Cascading Outage Bugs in Distributed Systems

semanticscholar(2021)

引用 0|浏览12
暂无评分
摘要
Modern distributed systems (“cloud systems”) have emerged as a dominant backbone for many of today’s applications. As these systems collectively become the “cloud operating system”, users expect high dependability including performance stability and availability. Small jitters in system performance or minutes of service downtimes can have a huge impact on company and user satisfaction. We try to improve cloud system availability by detecting and eliminating cascading outage bugs (CO bugs). CO bug is a bug that can cause simultaneous or cascades of failures to each of the individual nodes in the system, which eventually leads to a major outage. While hardware arguably is no longer a single point of failure, our large-scale studies of cloud bugs and outages reveal that CO bugs have emerged as a new class of outage-causing bugs and single point of failure in the software. We address the CO bug problem with the Cascading Outage Bugs Elimination (COBE) project. In this project, we: (1) study the anatomy of CO bugs, (2) develop CO-bug detection tools to unearth CO bugs.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要