Real-Time Diagnosis of Configuration Errors for Software of AI Server Infrastructure

IEEE Transactions on Dependable and Secure Computing(2023)

引用 0|浏览13
暂无评分
摘要
Artificial intelligence (AI) server infrastructure has been built to support AI applications and handle data-intensive workloads. AI server infrastructure is the essential building blocks, and errors in AI server infrastructure may lead to fatal consequences to any AI applications built upon it. Compared to traditional software, software for AI server infrastructure is more configurable, and thus more likely to have configuration errors that might prevent correct software behaviors. Previous work on misconfiguration diagnosis requires sufficient execution history or manual intervention, and can hardly diagnose potential misconfigurations which are not triggered at launching. In this paper, we propose a real-time method to address these issues. Specifically, we combine program analysis and real-time log parsing to diagnose configuration errors. It maps each configuration option to the log code by applying program slicing only once, and parses real-time logs during the operation of the AI server without manual intervention. We evaluate the effectiveness of our approach on the core components of Hadoop, an exemplar AI Server Infrastructure Software. The results show that our method mapped more than 80% of the configuration options to log outputs, identified 90% of the configuration read sites as the slicing seeds, and successfully diagnosed about 10% configuration errors that can not be addressed by previous studies.
更多
查看译文
关键词
AI server infrastructure,diagnosis,misconfiguration,slicing,static program analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要