PreDatA – preparatory data analytics on peta-scale machines

IPDPS(2010)

引用 210|浏览70
暂无评分
摘要
Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful to science end users, such data must be organized in its layout, indexed, sorted, and otherwise manipulated for subsequent data presentation, visualization, and detailed analysis. In addition, scientists desire to gain insights into selected data characteristics `hidden' or `latent' in these massive datasets while data is being produced by simulations. PreDatA, short for Preparatory Data Analytics, is an approach to preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines. By dedicating additional compute nodes on the machine as `staging' nodes and by staging simulations' output data through these nodes, PreDatA can exploit their computational power to perform select data manipulations with lower latency than attainable by first moving data into file systems and storage. Such intransit manipulations are supported by the PreDatA middleware through asynchronous data movement to reduce write latency, application-specific operations on streaming data that are able to discover latent data characteristics, and appropriate data reorganization and metadata annotation to speed up subsequent data access. PreDatA enhances the scalability and flexibility of the current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and inspection, as well as for data exchange between concurrently running simulations.
更多
查看译文
关键词
data reorganization,data access,data preprocessing,streaming data,metadata annotation,data manipulations,high performance storage,data exchange,peta-scale scientific applications,preparatory data analytics,storage management,data inspection,hec platforms,data analysis,write latency,large scale simulations,electronic data interchange,data visualisation,asynchronous data movement,application-specific operations,file storage,high end computing platforms,predata middleware,data visualization,middleware,runtime data analysis,i/o stack,meta data,data presentation,file systems,peta-scale machines,indexation,simulation model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要