Why Visualize Data When Coding? Preliminary Categories for Coding in Jupyter Notebooks

2022 29th Asia-Pacific Software Engineering Conference (APSEC)(2022)

引用 0|浏览10
暂无评分
摘要
Data visualization becomes a crucial component in data analytics, especially data exploration, understanding, and analysis. Effective data visualization impacts decision-making and aids in discovering and understanding relationships. It leads to benefits in data-intensive software development tasks e.g., feature engineering in machine learning-based software projects. However, it is unknown how visualizations are used in competitive programming. The idea of this paper is to report early results on what visualizations are prevalent in competitive programming. Grandmasters are the highest level reached in competitions (novice, expert, master, and grandmaster). Analyzing the visualizations of 7 high-rank competitors (i.e., Grandmaster) in Kaggle, we identify and present a catalog of visualizations used to both tell a story from the data, as well as explain the process and pipelines involved to explain their coding solutions. Our taxonomy includes nine types from over 821 visualizations in 68 instances of Jupyter notebooks. Furthermore, most visualizations are for data analysis for distribution (DA Distribution), and frequency (DA Frequency) are most used. We envision that this catalog can be useful to better understand different situations in which to employ these visualizations.
更多
查看译文
关键词
data visualization,machine learning competition,data analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要