AquaSee: Predict Load and Cooling System Faults of Supercomputers Using Chilled Water Data

来源 :计算机科学技术学报(英文版) | 被引量 : 0次 | 上传用户:dza1987
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
An analysis of real-world operational data of Tianhe-1A (TH-1A) supercomputer system shows that chilled water data not only can reflect the status of a chiller system but also are related to supercomputer load. This study proposes AquaSee, a method that can predict the load and cooling system faults of supercomputers by using chilled water pressure and temperature data. This method is validated on the basis of real-world operational data of the TH-1A supercomputer system at the National Supercomputer Center in Tianjin. Datasets with various compositions are used to construct the prediction model, which is also established using different prediction sequence lengths. Experimental results show that the method that uses a combination of pressure and temperature data performs more effectively than that only consisting of either pressure or temperature data. The best inference sequence length is two points. Furthermore, an anomaly monitoring system is set up by using chilled water data to help engineers detect chiller system anomalies.
其他文献
With the convergence of high-performance computing (HPC), big data and artificial intelligence (AI), the HPC community is pushing fortriple usesystems to expedi
It is hard for applications to make full utilization of the peak bandwidth of the storage system in high-performance computers because of I/O interferences, sto
期刊
Burst buffer has become a major component to meet the I/O performance requirement of HPC bursty traffic. This paper proposes Gfarm/BB that is a file system for
Storage backends of parallel compute clusters are still based mostly on magnetic disks, while newer and faster storage technologies such as flash-based SSDs or
Technology enhancements and the growing breadth of application workflows running on high-performance computing (HPC) platforms drive the development of new data
Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and m
Both resource efficiency and application QoS have been big concs of datacenter operators for a long time, but remain to be irreconcilable. High resource utiliza
针对选矿工业过程流程长、工序多、生产指标多的特点,结合数据可视化及可视分析技术,开发一种选矿生产指标可视化监控组态平台.该平台包括生产工艺可视化组态设计环境、生产