Machine Learning Systems are Bloated and Vulnerable
CoRR(2022)
摘要
Today's software is bloated with both code and features that are not used by
most users. This bloat is prevalent across the entire software stack, from
operating systems and applications to containers. Containers are lightweight
virtualization technologies used to package code and dependencies, providing
portable, reproducible and isolated environments. For their ease of use, data
scientists often utilize machine learning containers to simplify their
workflow. However, this convenience comes at a cost: containers are often
bloated with unnecessary code and dependencies, resulting in very large sizes.
In this paper, we analyze and quantify bloat in machine learning containers. We
develop MMLB, a framework for analyzing bloat in software systems, focusing on
machine learning containers. MMLB measures the amount of bloat at both the
container and package levels, quantifying the sources of bloat. In addition,
MMLB integrates with vulnerability analysis tools and performs package
dependency analysis to evaluate the impact of bloat on container
vulnerabilities. Through experimentation with 15 machine learning containers
from TensorFlow, PyTorch, and Nvidia, we show that bloat accounts for up to 80
of machine learning container sizes, increasing container provisioning times by
up to 370
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要