Monitoring, Analyzing, and Controlling Internet-scale Systems with ACME

Clinical Orthopaedics and Related Research(2004)

引用 33|浏览64
暂无评分
摘要
Analyzing and controlling large distributed services under a wide range of conditions is difficult. Yet these capabili- ties are essential to a number of important development and operational tasks such as benchmarking, testing, and system management. To facilitate these tasks, we have built the Application Control and Monitoring Environment (ACME), a scalable, flexible infrastructure for monitoring, analyzing, and controlling Internet-scale systems. ACME consists of two parts. ISING, the Internet Sensor In-Net- work agGregator, queries "sensors" and aggregates the results as they are routed through an overlay network. ENTRIE, the ENgine for TRiggering Internet Events, uses the data streams supplied by ISING, in combination with a user's XML configuration file, to trigger "actuators" such as killing processes during a robustness benchmark or paging a system administrator when predefined anoma- lous conditions are observed. In this paper we describe the design, implementation, and evaluation of ACME and its constituent parts. We find that for a 512-node system run- ning atop an emulated Internet topology, ISING's use of in-network aggregation can reduce end-to-end query- response latency by more than 50% compared to using either direct network connections or the same overlay net- work without aggregation. We also find that an untuned implementation of ACME can invoke an actuator on one or all nodes in response to a discrete or aggregate event in less than four seconds, and we illustrate ACME's applica- bility to concrete benchmarking and monitoring scenarios.
更多
查看译文
关键词
system management,overlay network,internet topology,cluster computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要