Sciweavers

IPPS
2009
IEEE

Exploiting DMA to enable non-blocking execution in Decoupled Threaded Architecture

14 years 7 months ago
Exploiting DMA to enable non-blocking execution in Decoupled Threaded Architecture
DTA (Decoupled Threaded Architecture) is designed to exploit fine/medium grained Thread Level Parallelism (TLP) by using a distributed hardware scheduling unit and relying on existing simple cores (in-order pipelines, no branch predictors, no ROBs). In DTA, the local variables and synchronization data are communicated via a fast frame memory. If the compiler can not remove global data accesses, the threads are excessively fragmented. Therefore, in this paper, we present an implementation of a pre-fetching mechanism (for global data) that complements the original DTA pre-load mechanism (for consumer-producer data patterns) with the aim of improving non-blocking execution of the threads. Our implementation is based on an enhanced DMA mechanism to prefetch global data. We estimated the benefit and identified the required support of this proposed approach, in an initial implementation. In case of longer latency to access memory, our idea can reduce execution time greatly (i.e., 11x for...
Roberto Giorgi, Zdravko Popovic, Nikola Puzovic
Added 24 May 2010
Updated 24 May 2010
Type Conference
Year 2009
Where IPPS
Authors Roberto Giorgi, Zdravko Popovic, Nikola Puzovic
Comments (0)