ATLAS TDAQ System Administration: evolution and re-design

S Ballestrero,Sergei Dubrov, Christopher Jon Lee, A A Korol, Matthew Shaun Twomey,Alexander Bogdanchikov,Alexandru Cristian Contescu, Daniel Fazio,D A Scannicchio, F Brasolin

21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9(2015)

引用 1|浏览16
暂无评分
摘要
The ATLAS Trigger and Data Acquisition system is responsible for the online processing of live data, streaming from the ATLAS experiment at the Large Hadron Collider at CERN. The online farm is composed of similar to 3000 servers, processing the data read out from X100 million detector channels through multiple trigger levels. During the two years of the first Long Shutdown there has been a tremendous amount of work done by the ATLAS Trigger and Data Acquisition System Administrators, implementing numerous new software applications, upgrading the OS and the hardware, changing some design philosophies and exploiting the High Level Trigger farm with different purposes. The OS version has been upgraded to SLC6; for the largest part of the farm, which is composed of net booted nodes, this required a completely new design of the net booting system. In parallel, the migration to Puppet of the Configuration Management systems has been completed for both net booted and local booted hosts; the Post Boot Scripts system and Quattor have been consequently dismissed. Virtual Machine usage has been investigated and tested and many of the core servers are now running on Virtual Machines. Virtualisation has also been used to adapt the High-Level Trigger farm as a batch system, which has been used for running Monte Carlo production jobs that are mostly CPU and not I/O bound. Finally, monitoring the health and the status of similar to 3000 machines in the experimental area is obviously of the utmost importance, so the obsolete Nagios v2 has been replaced with Icinga, complemented by Ganglia as a performance data provider. This paper serves for reporting of the actions taken by the Systems Administrators in order to improve and produce a system capable of performing for the next three years of ATLAS data taking.
更多
查看译文
关键词
atlas
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要