PADMINI: A Peer-to-Peer Distributed Astronomy Data Mining System and a Case Study

CIDU(2010)

引用 25|浏览79
暂无评分
摘要
Peer-to-Peer (P2P) networks are appealing for astronomy data mining from virtual observatories because of the large volume of the data, compute-intensive tasks, potentially large number of users, and distributed nature of the data analysis process. This paper offers a brief overview of PADMINI—a Peer-to-Peer Astronomy Data MINIng system. It also presents a case study on PADMINI for distributed outlier detection using astronomy data. PADMINI is a web- based system powered by Google Sky and distributed data mining algorithms that run on a collection of computing nodes. This paper offers a case study of the PADMINI evaluating the architecture and the performance of the overall system. Detailed experimental results are presented in order to document the utility and scalability of the system. As the amount of data available at various geographically distributed sources is increasing rapidly, traditional centralized techniques for performing data analytics are proving to be insufficient for handling this data avalanche. For instance, astronomy research which relies primarily on the data available at various sky surveys presents such challenges. Downloading and processing all the data at a single location results in increased communication as well as infrastructural costs. Moreover, such centralized approaches cannot fully exploit the power of emerging distributed computing networks such as Peer-to-Peer (P2P) user-networks. An alternative to this approach is to distribute such computationally intensive tasks among various participating nodes which can also be geographically distributed. Data mining solutions that pay careful attention to the resource-consumption in a dis- tributed environment need to be developed. This paper particiularly considers P2P networks for creating such distributed solutions. In this paper we report a case study for the PADMINI—Peer-to-Peer Astronomy Data MINIng system 1 . Unlike centralized data mining systems, PADMINI is a web-based system powered by vari- ous distributed data mining algorithms that run on a collection of computing nodes forming a Peer- to-Peer (P2P) network. PADMINI is an easy to use and scalable system for submitting astronomy jobs in which the collection of data for these jobs and their execution is performed in a distributed fashion. This distributed web application is designed to help astronomy researchers and hobbyists in analyzing data from Astronomy Virtual Observatories (VOs). The back-end distributed computa- tion network supports two frameworks, namely the Distributed Data Mining Toolkit (DDMT) and Hadoop. The rest of the paper is organized as follows: Section 2 presents the motivation behind build- ing the PADMINI system. It explains the specific astronomy data mining problem that the paper intends to address. Section 3 briefly describes the related work in the field of P2P data mining. Section 4 gives an overview of the architecture of the system and describes each of itu0027s components in detail. The implementation details of the system are described in Section 5. Secion 6 describes
更多
查看译文
关键词
data mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要