Sciweavers

DCC
2010
IEEE

Lossless Compression Based on the Sequence Memoizer

14 years 6 months ago
Lossless Compression Based on the Sequence Memoizer
In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of Pitman-Yor processes of unbounded depth previously proposed by Wood et al. [16] in the context of language modelling, allows modelling of long-range dependencies by allowing conditioning contexts of unbounded length. We show that incremental approximate inference can be performed in this model, thereby allowing it to be used in a text compression setting. The resulting compressor reliably outperforms several PPM variants on many types of data, but is particularly effective in compressing data that exhibits power law properties.
Jan Gasthaus, Frank Wood, Yee Whye Teh
Added 17 May 2010
Updated 17 May 2010
Type Conference
Year 2010
Where DCC
Authors Jan Gasthaus, Frank Wood, Yee Whye Teh
Comments (0)