谷歌浏览器插件
订阅小程序
在清言上使用

Relational database management system support for sparse data sets

Relational database management system support for sparse data sets(2006)

引用 23|浏览16
暂无评分
摘要
This dissertation describes techniques for optimizing physical storage of sparse data sets that are stored in a Relational Database Management System (RDBMS). "Sparse" data, in which relations have many attributes that are null for most tuples, presents a challenge for an RDBMS. The traditional RDBMS approach to physical storage assumes that rows of tables have few null values, and that most tables will have tens of columns, and never more than a few hundred columns. Sparse data sets challenge both assumptions by having hundreds to thousands of columns in a table and rows with mostly null values.In the space of physical storage optimization, we consider new techniques for physical record storage, providing indexing support over the data, and vertically partitioning the data. We analyze a row storage format called interpreted storage and show that it is a better than the traditional positional storage format for sparse data. For indexing, we show that it is feasible to index most, if not all, attributes in a sparse data set and efficiently maintain all of the indexes by utilizing sparse indexes, which do not index null values.We continue our study by analyzing real-world sparse data sets and identify that the data contains frequently co-occur attributes within the rows of the data. We recognize that vertical partitioning can exploit these patterns to store a sparse data set in smaller tables. The small tables provide faster scan times over groups of co-occurring attributes than scanning the entire data set. We isolate the tradeoffs in choosing groups of attributes for vertical partitions, and suggest, a method that uses k-nearest neighbors for mining the patterns in the data. We integrated all of our techniques into the open-source database management system PostgreSQL. The integration illustrates the practicality and viability of implementing our techniques in other systems. We conclude with an overview of our results and directions for future work.
更多
查看译文
关键词
physical storage optimization,row storage format,real-world sparse data set,entire data,sparse data set,sparse data,relational database management system,sparse index,physical record storage,null value,physical storage
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要