Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding. Associative search latency does not scale well to capacities and bandwidths re...
The speed gap between processor and memory continues to limit performance. To address this problem, we explore the potential of eliminating Zero Loads—loads accessing memory loc...
Md. Mafijul Islam, Sally A. McKee, Per Stenstr&oum...
Many modern embedded processors (esp. DSPs) support partitioned memory banks (also called X-Y memory or dual bank memory) along with parallel load/store instructions to achieve co...
Xiaotong Zhuang, Santosh Pande, John S. Greenland ...
We identify that a set of multimedia applications exhibit highly regular read-after-read (RAR) and read-after-write (RAW) memory dependence streams. We exploit this regularity to ...
In this paper, we present a method for approximating the values of sensors in a wireless sensor network based on time series forecasting. More specifically, our approach relies on ...