Diagnosing Performance Problems in Parallel File Systems

msra(2009)

引用 23|浏览4
暂无评分
摘要
Abstract This work describes and compares two black-box approaches, using syscall statistics and OS-level perfor- mance metrics, to automatically diagnose different performance problems in parallel file systems. Both approaches rely on peer-comparison diagnosis to compare,statistical attributes of relevant metrics across servers in order to indict the culprit node. An observation-based checklist is developed to identify from the metrics affected the faulty resource, and it is demonstrated that this checklist applies commonly across stripe-based parallel file systems. These approaches are demonstrated for four realistic problems‐disk-hog, disk-busy, network-hog, and packet-loss‐injected into three different file-system benchmarks, dd, postmark, and IOzone, in PVFS and Lustre clusters. Contents
更多
查看译文
关键词
packet loss
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要