Mira: Sharing Resources for Distributed Analytics at Small Timescales

2018 IEEE International Conference on Big Data (Big Data)(2019)

引用 4|浏览60
暂无评分
摘要
Modern distributed analytics stacks consist of application frameworks that enable processing of large amounts of data, and a resource manager that allows applications to share computational resources. The initial use case for these systems was running batch jobs with long lifetimes (e.g., a few hours), but, since their inception, new use cases have emerged where users increasingly use them to gain insight interactively, or even online. Efficiently sharing resources under these additional use cases, requires operating at smaller timescales (minutes or even seconds) than the existing systems were designed for and are capable of.In this paper, we present Mira, a system for optimized elastic execution of short-running and interactive data-analytics applications with low-latency execution startup, fast resource management and efficient resource utilization on shared clusters. We analyze the resource sharing overheads in a commonly used distributed processing stack (Spark+YARN) and reveal opportunities to accelerate applications in shared environments. Our experiments show, that Mira is able to reduce resource sharing related overheads by more than 400× and reduce application runtime by up to 4.2×.
更多
查看译文
关键词
shared clusters,shared environments,Mira,resource sharing,small timescales,modern distributed analytics,resource manager,interactive data-analytics applications,resource management,resource utilization,distributed processing stack,Spark+YARN
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要