PyPads

Datenbank-Spektrum(2024)

引用 0|浏览1
暂无评分
摘要
Despite algorithmic advancements in the field of machine learning, a need for improvement in the infrastructure supporting machine learning development and research has become increasingly apparent. Machine learning experiments usually tend to be more ad-hoc in nature, and results are communicated most often in the form of a publication. Experimental details are often omitted due to size or time constraints, or simply because the complexity in terms of technical setup or parametrization became intractable. Even access to code bases, disregard important properties of the environment and experimental setup, like for example random generators or computing infrastructure. At the same time, tracking and communicating an often inherently exploratory scientific process is a task with considerable effort. We explored different venues to tackle these issues from a data science engineering point of view. The efforts resulted in PyPads, a framework providing an infrastructure to extend experimental setups with logging, communication and analysis features in a mostly non-intrusive way. PyPads can be extended to different Python-based frameworks, utilizing community driven, descriptive metadata in an effort to harmonize library specific logs in an ontology. Meanwhile, we also try to emphasize similarities to practices in software engineering, which have turned out to be essential in practical applications.
更多
查看译文
关键词
Machine Learning,Reproducibility,Open Science,Automated Logging,Python
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要