Towards Framework-Independent, Non-Intrusive Performance Characterization for Dataflow Computation

Proceedings of the 10th ACM SIGOPS Asia-Pacific Workshop on Systems(2019)

引用 3|浏览6
暂无评分
摘要
Troubleshooting performance bugs for dataflow computation often leads to a "painful" process, even for experienced developers. Existing approaches to configuration tuning or performance analysis are either specific to a particular framework or in need of code instrumentation. In this paper, we propose a framework-independent and non-intrusive approach to performance characterization. For each job, we first assemble the information provided by off-the-shelf profilers into a DAG-based execution profile. We then locate, for each DAG node (operation), the source code of its executed functions. Our key insight is that code contains learnable lexical and syntactic patterns that reveal resource information. We hence perform code analysis and infer the operations' resource usage with machine learning classifiers. Based on them, we establish a performance-resource model that correlates the job performance with the resources used. The evaluation with two Spark use cases demonstrates the effectiveness of our approach in detecting program bottlenecks and predicting job completion time under various resource configurations.
更多
查看译文
关键词
Dataflow Systems, Multi-Class Classification, Performance Characterization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要