How Data Scientists Improve Generated Code Documentation in Jupyter Notebooks.

Michael J. Muller,April Yi Wang,Steven I. Ross,Justin D. Weisz,Mayank Agarwal,Kartik Talamadupula,Stephanie Houde,Fernando Martinez,John T. Richards,Jaimie Drozdal,Xuye Liu,David Piorkowski,Dakuo Wang

IUI Workshops（2021）

引用 3|浏览91

暂无评分

摘要

Generative AI models are capable of creating high-fidelity outputs, sometimes indistinguishable from what could be produced by human effort. However, some domains possess an objective bar of quality, and the probabilistic nature of generative models suggests that there may be imperfections or flaws in their output. In software engineering, for example, code produced by a generative model may not compile, or it may contain bugs or logical errors. Various models of human-AI interaction, such as mixed-initiative user interfaces, suggest that human effort ought to be applied to a generative model’s outputs in order to improve its quality. We report results from a controlled experiment in which data scientists used multiple models – including a GNN-based generative model – to generate and subsequently edit documentation for data science code within Jupyter notebooks. In analyzing their edit-patterns, we discovered various ways that humans made improvements to the generated documentation, and speculate that such edit data could be used to train generative models to not only identify which parts of their output might require human attention, but also how those parts could be improved.

查看译文

关键词

generated code documentation,jupyter notebooks,data scientists

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要