Erasure Codes with Efficient Repair

semanticscholar(2011)

引用 0|浏览3
暂无评分
摘要
We consider erasure coding in the context of storage applications and propose a metric to capture the complexity of symbol operations associated with dynamically repairing erased code symbols. We then propose a simple construction that has a drastically improved repair complexity compared to standard erasure codes at the expense of a slight increase in the overhead. We also propose a systematic rateless code whi ch has constant encoding/decoding complexity, again at the ex pense of small overhead parameter. The original systematic Rapto r code proposed by Shokrollahi has a linear per symbol encodin g complexity. I. I NTRODUCTION While the benefits of erasure coding for storage are well understood with regard to the savings on the number of storag e units for a given level of redundancy, other important aspec ts that favor replication as opposed to coding, are not as well understood. The most significant of these issues ( [1]) is the latency related to recovery from failures. Another rela ted issue that has been pointed out is the large state dependency metric, which refers to the need to contact a potentially lar ge number of servers/storage units for each recovery. An effici nt decoding/encoding algorithm does not, by itself, provide a n efficient recovery algorithm because decoding efficiency in the context of codes designed for communication have a basic assumption of recovering the entire message block. Even though the storage gains of coding over replication are extremely large (by a factor that scales with the number of message blocks being coded), storage cost itself could be less important compared to other factors like repair latenc y. As long as a code delivers the order of magnitude level improvements related to storage over replication, the need for strict information theoretic optimality of storage overhe ad of the code is less significant compared to the ability to repair destroyed code symbols efficiently. Another issue where fixed rate code designs face an obstacle unlike replication is related to the flexibility of adding or deleting storage units dynamically ( [2]). When a code is designed with a fixed rate beforehand, adding or deleting cod e symbols no longer keeps the guarantees that were provided fo r the original design. II. REPAIR COMPLEXITY The repair complexity of a code refers to the complexity of operations required to reconstruct subsets of its code symb ols using other code symbols. More precisely, let C be an erasure code with its symbols indexed by integers from [n]. For some D ⊆ [n] (that is destroyed) and another (disjoint) subsetR ⊆ [n] (the recovery set), a repair algorithm then reconstructs the code symbols indexed in D using those from R, when feasible. Typically, one may consider R = [n]/D. The repair complexity for a given reconstruction instance is then understood to be the average (over the number of elements inD) number of symbol operations performed by the repair algorithm. R(D) = # of symbol operations in repairing D |D| (1) A code on source data divided into k symbols is said to have an overheadδ if one has the guarantee that any k(1+ δ) code symbols are sufficient to reconstruct the source data. When δ = 0, we have an optimal code with respect to its overhead. Loosely speaking, the ease of repair is closely related to th e overhead. The closer to optimal the overhead is, the harder i t gets to repair the code. This tradeoff has been investigated in prior work extensively for a broadly related problem called the repair bandwidth minimization problem. However, our work has some fundamental differences with this line of work as explained below: Distinction from the bandwidth minimization problem There has been substantial work on the problem of optimizing the repair bandwidth against the overhead (eg. [3] and the references therein). There are however, two fundamenta l differences with the notion of repair complexity considere d here. (1) In this paper, we consider a code symbol as a single atomic unit of memory which can not be split further. In [3], each node stores data that can be split into subunits and the goal is to optimize the bandwidth across the nodes. In other words, any operational complexity within a node is not a part of the optimization. (2) Our goal of optimizing the number of symbol operations required for reconstructio n is more closely aligned to disk access than the bandwidth communicated across nodes itself. For bandwidth optimizat ion ( [3]), the codes designed are expected to download miniscul e uniform amounts of compressed/coded data from each node over a large number of nodes to save on the final bandwidth, whereas the total number of symbol operations performed prior to communication could be quite expensive due to large number of symbols accessed. A. Repair complexity of examples 1) Parity Code: A parity code of dimensionk is {v ∈ {0, 1} : vk+1 = ∑k i=1 vi}. Let D = {i} for some i ∈ [k + 1] andR = {[k + 1]/D}. The repair is performed by the relation vi = ∑ j 6=i vj , which involvesk symbol operations. So the repair complexity isk. 2) Repetition Code:Let the code be{v ∈ {0, 1} : v = [v v . . . v } {{ } l times ]} for somev ∈ {0, 1}. Let D ⊆ [kl] such that the number of indices inD which are equivalent modulo k is at mostl − 1. To repairD, we need to perform at most |D| symbol operations in copying the corresponding symbol from R, implying a repair complexity of 1. But the overhead for this code isδ = l k−1 k + 1 k − 1, which is arbitrarily bad for large values ofl. 3) Rateless Codes Random Linear Codes, LT Codes, Raptor Codes:Consider a rateless code with dimension k, i.e. the source message consists of k symbols and whose encoding/decoding complexities are on average α and β per symbol respectively and the overhead is δ. For RLC,α = β = θ(k), δ = o(k); for LT codes,α = β = O(log k), δ = o(k) and for Raptor Codes, α = β = O(1) andδ = , a small positive number. Consider a recovery set R with |R| = k(1 + δ). A natural repair algorithm is (1) Decode the source symbols, S from R. (2) Encode the missing code symbols D using S. Under the given assumptions, step 1 incurs βk symbol operations and step 2 incurs α|D| symbol operations. Therefore, the repair complexity forD is: βk + α|D| |D| = β k |D| + α This could be efficient if|D| ≈ k, but whenD is small, even constant encoding/decoding complexities do not provide us with a corresponding efficient repair guarantee. III. T HE AUGMENTED LT CODE Let S = {s1, . . . , sk} represent the data to be stored, divided into fragments which are also called source symbols . Let c1, c2, . . . represent an LT coded stream generated on S with degree distributionΩ on [k]. The augmented LT code is defined as the rateless code formed by adjoining the uncoded source symbols to the LT coded stream, i.e. {s1, . . . , sk, c1, c2, . . .}. A. Repair Algorithm for the Augmented LT Code Let D be the set of indices of the code symbols to be repaired andR be a recovery set. Let RC = R ⋂ {c1, c2, . . .} and RS = RC ⋃ S (so R = RC ⋃ RS). Let |D| = t and denoteDS = D ⋂ S and DC = D/S = D ⋂ {c1, . . .}. Consider the case when |RC | ≥ k(1 + ) where > 0 andk is large. The repair algorithm for D from R is given below. 1) Since|RC | ≥ k(1+ ), w.h.p. it is feasible to iteratively decodeS from RC . Let sπ(1), sπ(2), . . . sπ(k) be a sequence in which the source symbols could be decoded from RC . Let cξ(1), . . . , cξ(k) be the corresponding sequence in which code symbols from RC are processed from the gross ripple during the decoding process 1. π and ξ can be easily computed without performing any symbol operations, by just processing the packet headers or the random number seed generator used for the code construction. We have:
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要