Unboxing Default Argument Breaking Changes in Scikit Learn.

João Eduardo Montandon, Luciana Lourdes Silva,Cristiano Politowski, Ghizlane El-Boussaidi, Marco Túlio Valente

2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM)（2023）

引用 0|浏览0

暂无评分

摘要

Machine Learning (ML) has revolutionized the field of computer software development, enabling data-based predictions and decision-making across several domains. Following modern software development practices, developers use third-party libraries—e.g., Scikit Learn, TensorFlow, and PyTorch—to integrate ML-based functionalities into their applications. Due to the complexity inherent in ML techniques, the models available in the APIs of these tools often require an extensive list of arguments to be set up. Library maintainers overcome this issue by defining default values for most of these arguments so developers can use ML models in their client applications effortlessly. By relying on these default arguments, the clients inadvertently depend on the value defined in these parameters to keep running as expected. We interpret this problem as a semantical breaking change variant, which we named Default Argument Breaking Change (DABC). In this work, we leverage 77 DABCs in Scikit Learn—a well-known ML library—and investigate how 194K client applications are vulnerable to them. Our results show that 72 DABCs (93%) are responsible for exposing 67,747 clients (35%). We also detected that most DABCs (61, 79%) involve APIs used in ML model training and model evaluation stages. Finally, we discuss the importance of managing DABCs in third-party ML libraries and provide insights for developers to mitigate the potential impact of these changes in their applications.

查看译文

关键词

breaking changes,default arguments,machine learning,scikit learn,python

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要