Discovering Parallelisms in Python Programs

Siwei Wei, Guyang Song, Senlin Zhu, Ruoyi Ruan, Shihao Zhu,Yan Cai

PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023(2023)

引用 0|浏览0
暂无评分
摘要
Parallelization is a promising way to improve the performance of Python programs. Unfortunately, developers may miss parallelization possibilities, because they usually do not concentrate on parallelization. Many approaches have been proposed to parallelize Python programs automatically, however, they are either domain-specific or require manual annotation. Thus they cannot solve the problem well in general. In this paper, we propose PyPar, an effective tool aiming at discovering parallelization possibilities in real-world Python programs. PyPar doesn't need manual annotation and is universally applicable. It first drives a data-dependence analysis to determine whether two pieces of code can run concurrently. The key is the use of a graph-theoretic approach. Next, it adopts a dynamic selection strategy to eliminate inefficient parallelisms. Finally, PyPar produces a parallelism report as well as a referential parallelized program, which is built by PyPar using one of the three parallelization methods (thread-based, process-based, and Ray-based). We have implemented a prototype of PyPar and evaluated it on six well-designed widely-used real-world Python packages: Scikit-Image, SciPy, librosa, trimesh, Scikit-learn and seaborn. In total, 1,240 functions are tested, and PyPar found 127 parallelizable functions among them. Based on manual filtering, only 7 of them are false positives (i.e., a 94.5% precision). The remaining 120 are parallelizable (almost 10% among all functions under test), and most of them can be efficiently sped up by gaining an acceleration of up to 90%, with an average of 44%. The acceleration in practice is close to theoretical estimation. The results show that even well-designed practical Python programs can be further parallelized for speeding up, and PyPar can bring effective and efficient parallelization on real-world Python programs.
更多
查看译文
关键词
Parallelism,Python,Ray
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要