New portable consumer embedded devices must execute multimedia applications (e.g., 3D games, video players and signal processing software, etc.) that demand extensive memory acces...
The growing processor/memory performance gap causes the performance of many codes to be limited by memory accesses. If known to exist in an application, strided memory accesses fo...
Tushar Mohan, Bronis R. de Supinski, Sally A. McKe...
We investigate the implementation of IP look-up for core routers using multiple microengines and a tailored memory hierarchy. The main architectural concerns are limiting the numb...
Jean-Loup Baer, Douglas Low, Patrick Crowley, Neal...
Cache optimizations typically include code transformations to increase the locality of memory accesses. An orthogonal approach is to enable for latency hiding by introducing prefet...
Instruction combining is an optimization to replace a sequence of instructions with a more efficient instruction yielding the same result in a fewer machine cycles. When we use it...
We propose a novel algorithm based on random graphs to construct minimal perfect hash functions h. For a set of n keys, our algorithm outputs h in expected time O(n). The evaluatio...
Fabiano C. Botelho, Yoshiharu Kohayakawa, Nivio Zi...
As network traffic continues to increase and with the requirement to process packets at line rates, high performance routers need to forward millions of packets every second. Eve...
Due to the powerful error correcting performance, turbo codes have been adopted in many wireless communication standards. Although several low-power techniques have been proposed,...
It is widely known that parallel operation execution in multiprocessor systems generates a respective increase in memory accesses. Since the memory and bus subsystems provide a li...
Grigoris Dimitroulakos, Michalis D. Galanis, Costa...
—As turbo decoding is a highly memory-intensive algorithm consuming large power, a major issue to be solved in practical implementation is to reduce power consumption. This paper...