Alleviating I/O Inefficiencies to Enable Effective Model Training Over Voluminous, High-Dimensional Datasets

Daniel Rammer,Walid Budgaga,Thilina Buddhika,Shrideep Pallickara,Sangmi Lee Pallickara

2018 IEEE International Conference on Big Data (Big Data)（2018）

引用 9|浏览19

暂无评分

摘要

There has been an exponential growth in data volumes in several domains. Often these voluminous datasets encompass a large number of features. Fitting models to such high-dimensional, voluminous data allows us to understand phenomena and inform decision-making. The analytics process is naturally iterative as scientists explore the set of features, data fitting algorithms, portions of the dataspace, and the particular algorithm's hyperparameters to guide their model-building process. It often takes several model-fitting attempts before one arrives at a satisfactory solution that may then be subjected to further refinements. Each of these model-building attempts is itself time-consuming and dominated by I/O and data movement costs. In this study, we present our methodology for significantly alleviating I/O-induced inefficiencies during model training. Rather than work with the raw data, we generate and work with sketches of the data. Our framework, Fennel, is independent of the libraries or analytical engines preferred by users. Our empirical benchmarks have been performed with datasets from diverse domains (weather, epidemiology, and music) and we profile several aspects of our methodology.

查看译文

关键词

model training,data sketches,multidimensional datasets,in-memory analytics

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要