It is known that universal compression of strings generated by i.i.d. sources over infinite alphabets entails infinite per-symbol redundancy. Continuing previous work [1], we consider alternative compression schemes which decompose the description of such strings into a description of the symbols appearing of the string and a description of the arrangement the symbols form. We consider two descriptions of the symbol arrangement: shapes and patterns. Roughly speaking, shapes describe the relative magnitude of the symbols while patterns describe only the order in which they appear. We prove that the per-symbol worst-case redundancy of compressing shapes is a positive constant less than one, and that the per-symbol redundancy of compressing patterns diminishes to zero as the blocklength increases. We also mention some results on sequential pattern compression.
Alon Orlitsky, Narayana P. Santhanam