In this paper, we suggest hardware-assisted data compression as a tool for reducing energy consumption of core-based embedded systems. We propose a novel and e cient architecture for on-the- y data compression and decompression whose eld of operation is the cache-to-memory path. Uncompressed cache lines are compressed before they are written back to main memory, and decompressed when cache re lls take place. We explore two classes of compression methods, pro le-driven and di erential, since they are characterized by compact HW implementations, and we compare their performance to those provided by some state-of-the-art compression methods (e.g., we have considered a few variants of the Lempel-Ziv encoder). We present experimental results about memory tra c and energy consumption in the cache-to-memory path of a core-based system running standard benchmark programs. The achieved average energy savings range from 4.2% to 35.2%, depending on the selected compression algorithm.