In this work, we present accordion arrays, a straightforward and effective memory compression technique targeting Unicode-based character arrays. In many non-numeric Java programs, character arrays represent a significant fraction (30-40% on average) of the heap memory allocated. In many locales, most, but not all, of those arrays consist entirely of characters whose top bytes are zeros, and, hence, can be stored as byte arrays without loss of information. In order to get the almost factor of two compression rate for character vectors, two challenges must be overcome: 1) all code that reads and writes character vectors must dynamically determine which kind of array is being accessed and perform byte or character loads/stores as appropriate, and 2) compressed vectors must be dynamically inflated when an incompressible character is written. We demonstrate how these challenges can be overcome with minimal space and execution time overhead, resulting in an average speedup of 2% across o...
Craig B. Zilles