GeoFlink: An Efficient and Scalable Spatial Data Stream Management System

IEEE ACCESS(2022)

引用 2|浏览5
暂无评分
摘要
This era is witnessing an exponential growth in spatial data due to the increase in GPS-enabled devices. Spatial data can be of extreme use to commercial businesses, governments and NGOs if processed timely. Spatial data is voluminous and is usually generated as a continuous data stream, for instance, vehicles tracking data, mobile location data, etc. To process such a huge data streams, highly scalable systems are needed. Apache Spark Streaming, Apache Flink, and Apache Samza are among the state-of-the-art scalable stream processing platforms; however, they lack spatial objects, indexes, and queries support. Besides them, other scalable spatial data processing platforms including GeoSpark, Spatial Hadoop do not support streaming workloads and can only handle static or batch data. To fill this gap, we present GeoFlink which extends Apache Flink to support spatial objects, indexes and continuous queries over spatial data streams. A grid-based index is introduced to support efficient spatial query processing and effective data distribution across distributed cluster nodes. GeoFlink supports spatial range, spatial kNN and spatial join queries on Point, LineString, Polygon, MultiPoint, MultiLineString, and MultiPolygon spatial objects. Besides, GeoFlink supports data streams in GeoJSON, WKT, and CSV data formats. A detailed experimental study on real and synthetic spatial data streams proves that GeoFlink achieves significantly higher query throughput than the existing state-of-the-art streaming platforms.
更多
查看译文
关键词
GeoFlink, spatial data, GeoSpatial, stream processing, spatial data management system, spatial index, spatial objects
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要