Bolt-on, Compact, and Rapid Program Slicing for Notebooks [Scalable Data Science].

Very Large Data Bases Conference (VLDB)(2022)

引用 6|浏览17
暂无评分
摘要
Computational notebooks are commonly used for iterative workflows, such as in exploratory data analysis. This process lends itself to the accumulation of old code and hidden state, making it hard for users to reason about the lineage of, e.g., plots depicting insights or trained machine learning models. One way to reason about code used to generate various notebook data artifacts is to compute a program slice, but traditional static approaches to slicing can be both inaccurate (failing to contain relevant code for artifacts) and conservative (containing unnecessary code for an artifacts). We present nbslicer, a dynamic slicer optimized for the notebook setting whose instrumentation for resolving dynamic data dependencies is both bolt-on (and therefore portable) and switchable (allowing it to be selectively disabled in order to reduce instrumentation overhead). We demonstrate nbslicer's ability to construct small and accurate backward slices (i.e., historical cell dependencies) and forward slices (i.e., cells affected by the "rerun" of an earlier cell), thereby improving reproducibility in notebooks and enabling faster reactive re-execution, respectively. Comparing nbslicer with a static slicer on 374 real notebook sessions, we found that nbslicer filters out far more superfluous program statements while maintaining slice correctness, giving slices that are, on average, 66% and 54% smaller for backward and forward slices, respectively.
更多
查看译文
关键词
rapid program slicing,notebooks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要