In this paper, we present an efficient procedure for building a piecewise linear function approximation of the speed function of a processor with hierarchical memory structure. Th...
Cache memories were invented to decouple fast processors from slow memories. However, this decoupling is only partial, and many researchers have attempted to improve cache use by p...
Spatial-lattice computations with finite-range interactions are an important class of easily parallelized computations. This class includes many simple and direct algorithms for ...
The growing gap between on-chip gates and off-chip I/O bandwidth argues for ever larger amounts of on-chip memory. Emerging portable consumer technology, such as digital cameras, ...
This paper presents Xetal-Pro SIMD processor, which is based on Xetal-II, one of the most computational-efficient (in terms of GOPS/Watt) processors available today. XetalPro supp...
Yifan He, Yu Pu, Richard P. Kleihorst, Zhenyu Ye, ...