Efficient Similarity Estimation for Systems Exploiting Data Redundancy

15 years 4 months ago

Download www.cs.cmu.edu

Many modern systems exploit data redundancy to improve efficiency. These systems split data into chunks, generate identifiers for each of them, and compare the identifiers among other data items to identify duplicate chunks. As a result, chunk size becomes a critical parameter for the efficiency of these systems: it trades potentially improved similarity detection (smaller chunks) with increased overhead to represent more chunks. Unfortunately, the similarity between files increases unpredictably with smaller chunk sizes, even for data of the same type. Existing systems often pick one chunk size that is "good enough" for many cases because they lack efficient techniques to determine the benefits at other chunk sizes. This paper addresses this deficiency via two contributions: (1) we present multi-resolution (MR) handprinting, an application-independent technique that efficiently estimates similarity between data items at different chunk sizes using a compact, multi-size repre...

Kanat Tangwongsan, Himabindu Pucha, David G. Ander

Real-time Traffic

Chunk Size | Communications | INFOCOM 2010 | Smaller Chunk | Systems |

claim paper

» Exploiting InMemory and OnDisk Redundancy to Conserve Energy in Storage Systems

» Exploiting content redundancy for web information extraction

» Efficient search in large textual collections with redundancy

» Efficient feature weighting methods for ranking

» TreePattern Similarity Estimation for Scalable Contentbased Routing

» Fast Multiview Disparity Estimation for Multiview Video Systems

» Orthrus efficient software integrity protection on multicores

» Efficient error estimating coding feasibility and applications

Post Info
More Details (n/a)

Added	13 Feb 2011
Updated	13 Feb 2011
Type	Journal
Year	2010
Where	INFOCOM
Authors	Kanat Tangwongsan, Himabindu Pucha, David G. Andersen, Michael Kaminsky

Comments (0)

Sciweavers

Efficient Similarity Estimation for Systems Exploiting Data Redundancy

Chunk Size | Communications | INFOCOM 2010 | Smaller Chunk | Systems |

Explore & Download

Productivity Tools

Sciweavers