In this paper, we introduce PTLsim, a cycle accurate full system x86-64 microprocessor simulator and virtual machine. PTLsim models a modern superscalar out of order x86-64 proces...
—Although SIMD extensions are a cost effective way to exploit the data level parallelism present in most media applications, we will show that they had have a very limited memory...
Architectures are usually compared by running the same workload on each architecture and comparing performance. When a single compiled binary of a program is executed on many diff...
Erez Perelman, Jeremy Lau, Harish Patil, Aamer Jal...
For simulation, a tradeoff exists between speed and accuracy. The more instructions simulated from the workload, the more accurate the results — but at a higher cost. To reduce ...
— The design trend towards CMPs has made the simulation of multiprocessor systems a necessity and has also made multiprocessor systems widely available. While a serial multiproce...
Address re-mapping techniques in so-called active memory systems have been shown to dramatically increase the performance of applications with poor cache and/or communication beha...
Dhiraj D. Kalamkar, Mainak Chaudhuri, Mark Heinric...
We have studied DRAM-level prefetching for the fully buffered DIMM (FB-DIMM) designed for multi-core processors. FB-DIMM has a unique two-level interconnect structure, with FB-DIM...
Parameter variation due to manufacturing error will be an unavoidable consequence of technology scaling in future generations. The impact of random variation in physical factors s...
Ke Meng, Frank Huebbers, Russ Joseph, Yehea I. Ism...