In current superscalar processors, all floating-point resources are idle during the execution of integer programs. As previous works show, this problem can be alleviated if the fl...
Ramon Canal, Joan-Manuel Parcerisa, Antonio Gonz&a...
Exponentially growing bandwidth requirements and slowing gains in processor speeds have led to the popularity of multiprocessor architectures. Network stack parallelism is increas...
Dynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-predict branch instructions. A recently proposed dynamic predication architecture, ...
We design and implement Mars, a MapReduce framework, on graphics processors (GPUs). MapReduce is a distributed programming framework originally proposed by Google for the ease of ...
Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govi...
level of abstraction, compared with the program representation for scalar optimizations. For example, loop unrolling and loop unrolland-jam transformations exploit the large regist...
Rakesh Krishnaiyer, Dattatraya Kulkarni, Daniel M....