Fault Injection Based Interventional Causal Learning for Distributed Applications.

AAAI(2023)

引用 2|浏览25
暂无评分
摘要
We apply the machinery of interventional causal learning with programmable interventions to the domain of applications management. Modern applications are modularized into interdependent components or services (e.g. microservices) for ease of development and management. The communication graph among such components is a function of application code and is not always known to the platform provider. In our solution we learn this unknown communication graph solely using application logs observed during the execution of the application by using fault injections in a staging environment. Specifically, we have developed an active (or interventional) causal learning algorithm that uses the observations obtained during fault injections to learn a model of error propagation in the communication among the components. The "power of intervention" additionally allows us to address the presence of confounders in unobserved user interactions. We demonstrate the effectiveness of our solution in learning the communication graph of well-known microservice application benchmarks. We also show the efficacy of the solution on a downstream task of fault localization in which the learned graph indeed helps to localize faults at runtime in a production environment (in which the location of the fault is unknown). Additionally, we briefly discuss the implementation and deployment status of a fault injection framework which incorporates the developed technology.
更多
查看译文
关键词
distributed applications,fault,learning,injection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要