Smart Server Crash Prediction in Cloud Service Data Center

intersociety conference on thermal and thermomechanical phenomena in electronic systems(2020)

引用 3|浏览20
暂无评分
摘要
In recent years, Cloud Service has gradually been adopted by more and more end customers. Large amounts of applications from various businesses has been migrated to Cloud. Availability is one of the key considerations for end customers when adopting Cloud Service, so CSPs (Cloud Service Providers) are pursuing ever higher standard of SLA (Service-Level Agreement) to accommodate the need. Especially when considering VM (Virtual Machine) based Cloud Service, where resources in one physical server are virtualized and shared among multiple tenants, a server crash would be a huge impact to tenants\u0027 business. One solution is to establish an effective and accurate method to predict server crash in advance, so that workloads can be migrated to a healthy server before impacting the service. It is extremely challenging to deliver accurate prediction, since server crash occurs due to all kinds of failures with most of them occurring randomly and suddenly.This paper proposes a smart server crash prediction method for triggering early warning and migration in Cloud Service data center. The proposed server crash perdition is developed based on hardware, firmware and software system information collected from low-level hardware indicators and kernel status to upper-level system logs in OS (Operation System). Machine learning algorithms are adopted in logs analysis and failure prediction. Random Forests algorithm is chosen upon all providing the best precision. The final proposed method is deployed and evaluated in Baidu\u0027s data center, and it achieved 93.33% and 87.33% precision in providing Minutes-level and Hours-level ahead-of-time warning in server crash prediction.
更多
查看译文
关键词
Cloud Server, Crash, Prediction, Random Forests, Virtual Machine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要