Sciweavers

FAST
2010

Bimodal Content Defined Chunking for Backup Streams

14 years 1 months ago
Bimodal Content Defined Chunking for Backup Streams
Data deduplication has become a popular technology for reducing the amount of storage space necessary for backup and archival data. Content defined chunking (CDC) techniques are well established methods of separating a data stream into variable-size chunks such that duplicate content has a good chance of being discovered irrespective of its position in the data stream. Requirements for CDC include fast and scalable operation, as well as achieving good duplicate elimination. While the latter can be achieved by using chunks of small average size, this also increases the amount of metadata necessary to store the relatively more numerous chunks, and impacts negatively the system's performance. We propose a new approach that achieves comparable duplicate elimination while using chunks of larger average size. It involves using two chunk size targets, and mechanisms that dynamically switch between the two based on querying data already stored; we use small chunks in limited regions of t...
Erik Kruus, Cristian Ungureanu, Cezary Dubnicki
Added 02 Oct 2010
Updated 02 Oct 2010
Type Conference
Year 2010
Where FAST
Authors Erik Kruus, Cristian Ungureanu, Cezary Dubnicki
Comments (0)