A variable-to-fixed length encoder partitions the source string into variable-length phrases that belong to a given and fixed dictionary. Tunstall, and independently Khodak, designed variable-to-fixed length codes for memoryless sources that are optimal under certain constraints. In this paper, we study the Tunstall and Khodak codes using variety of techniques ranging from stopping times for sums of independent random variables to Tauberian theorems and Mellin transform. After proposing an algebraic characterization of the Tunstall and Khodak codes, we present new results on the variance and a central limit theorem for dictionary phrase lengths. This analysis also provides a new argument for obtaining asymptotic results about the mean dictionary phrase length and average redundancy rates.
Michael Drmota, Yuriy A. Reznik, Wojciech Szpankow