Rethinking The Data Model: The Drillbit Proof-Of-Concept Library

20TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2013), PARTS 1-6(2014)

Cited 0|Views1
No score
Abstract
The focus of many software architectures of the LHC experiments is to deliver a well-designed Event Data Model (EDM). Changes and additions to the stored data are often expensive, requiring large amounts of CPU time, disk storage and man-power. In addition, differing needs between groups of physicists lead to a tendency for common data formats to grow in terms of contained information whilst still not managing to service all needs. We introduce a new way of thinking about the data model based on the Dremel column store architecture published by Google. We present an EDM concept based on Dremel, which has the potential to significantly reduce the storage requirement for these common formats, decrease the time needed for independent physicists to compare their results and improve the speed at which data reprocessings can feasibly take place. The Dremel low-level encoding is implemented in a proof-of-concept C++ library called Drillbit, and it is shown that using a different encoding of the current data could save as much as 20% of disk space on average across a wide number of real-world derived data sets.
More
Translated text
Key words
data model,proof-of-concept
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined