This paper discusses a register bank assignment problem for a popular network processor--Intel's IXP. Due to limited data paths, the network processor has a restriction that ...
Improving cache performance requires understanding cache behavior. However, measuring cache performance for one or two data input sets provides little insight into how cache behav...
In this paper, we study the effects of manipulating the architected direction of conditional branches. Through the use of statistical sampling, we find that about 40% of all dyna...
This paper analyzes an Intel Pentium 4 hyper-threading processor. The focus is to understand its performance and the underlying reasons behind that performance. Particular attenti...
Simultaneous multithreading (SMT) increases processor throughput by multiplexing resources among several threads. Despite the commercial availability of SMT processors, several as...
We present a technique for reducing the power dissipation in the course of writebacks and committments in a datapath that uses a dedicated architectural register file (ARF) to hol...
Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad ...
Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. ...
Recent research shows that the high occupancy of Coherence Controllers (CCs) is a major performance bottleneck in scalable shared-memory multiprocessors. In this paper, we propose...
Recent work has shown that multithreaded workloads running in execution-driven, full-system simulation environments cannot use instructions per cycle (IPC) as a valid performance ...
Effective modeling and management of hardware resources have always been critical toward generating highly efficient code in static compilers. With Just-In-Time compilation and dy...