Sciweavers

FAST
2008

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System

14 years 2 months ago
Avoiding the Disk Bottleneck in the Data Domain Deduplication File System
Disk-based deduplication storage has emerged as the new-generation storage system for enterprise data protection to replace tape libraries. Deduplication removes redundant data segments to compress data into a highly compact form and makes it economical to store backups on disk instead of tape. A crucial requirement for enterprise data protection is high throughput, typically over 100 MB/sec, which enables backups to complete quickly. A significant challenge is to identify and eliminate duplicate data segments at this rate on a low-cost system that cannot afford enough RAM to store an index of the stored segments and may be forced to access an on-disk index for every input segment. This paper describes three techniques employed in the production Data Domain deduplication file system to relieve the disk bottleneck. These techniques include: (1) the Summary Vector, a compact in-memory data structure for identifying new segments; (2) Stream-Informed Segment Layout, a data layout method t...
Benjamin Zhu, Kai Li, R. Hugo Patterson
Added 02 Oct 2010
Updated 02 Oct 2010
Type Conference
Year 2008
Where FAST
Authors Benjamin Zhu, Kai Li, R. Hugo Patterson
Comments (0)