Avoiding the Disk Bottleneck in the Data Domain Deduplication File System

14 years 1 months ago

Download www.usenix.org

Disk-based deduplication storage has emerged as the new-generation storage system for enterprise data protection to replace tape libraries. Deduplication removes redundant data segments to compress data into a highly compact form and makes it economical to store backups on disk instead of tape. A crucial requirement for enterprise data protection is high throughput, typically over 100 MB/sec, which enables backups to complete quickly. A significant challenge is to identify and eliminate duplicate data segments at this rate on a low-cost system that cannot afford enough RAM to store an index of the stored segments and may be forced to access an on-disk index for every input segment. This paper describes three techniques employed in the production Data Domain deduplication file system to relieve the disk bottleneck. These techniques include: (1) the Summary Vector, a compact in-memory data structure for identifying new segments; (2) Stream-Informed Segment Layout, a data layout method t...

Benjamin Zhu, Kai Li, R. Hugo Patterson

Real-time Traffic

Data Segments | Enterprise Data Protection | FAST 2008 | Operating System | Redundant Data Segments |

claim paper

Post Info
More Details (n/a)

Added	02 Oct 2010
Updated	02 Oct 2010
Type	Conference
Year	2008
Where	FAST
Authors	Benjamin Zhu, Kai Li, R. Hugo Patterson

Comments (0)

Sciweavers

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System

Data Segments | Enterprise Data Protection | FAST 2008 | Operating System | Redundant Data Segments |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers