A distributed geospatial publish/subscribe system on Apache Spark

Future Generation Computer Systems(2022)

引用 0|浏览0
暂无评分
摘要
Publish/subscribe is a messaging pattern where message producers, called publishers, publish messages which they want to be distributed to message consumers, called subscribers. Subscribers are required to subscribe to messages of interest in advance to be able to receive them upon the publishing. In this paper, we discuss a special type of publish/subscribe systems, namely geospatial publish/subscribe systems (GeoPS systems), in which both published messages (i.e., publications) and subscriptions include a geospatial object. Such an object is used to express both the location information of a publication and the location of interest of a subscription. We argue that there is great potential for using GeoPS systems for the Internet of Things and Sensor Web applications. However, existing GeoPS systems are not applicable for this purpose since they are centralized and cannot cope with multiple highly frequent incoming geospatial data streams containing publications. To overcome this limitation, we present a distributed GeoPS system in the cluster which efficiently matches incoming publications in real-time with a set of stored subscriptions. Additionally, we propose four different (distributed) replication and partitioning strategies for managing subscriptions in our distributed GeoPS system. Finally, we present results of an extensive experimental evaluation in which we compare the throughput, latency and memory consumption of these strategies. These results clearly show that they are both efficient and scalable to larger clusters. The comparison with centralized state-of-the-art approaches shows that the additional processing overhead of our distributed strategies introduced by the Apache Spark is almost negligible.
更多
查看译文
关键词
Geospatial data,Partitioning,Data replication,Big data,Data stream processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要