Exploiting DMA to enable non-blocking execution in Decoupled Threaded Architecture

15 years 11 months ago

Download www.dii.unisi.it

DTA (Decoupled Threaded Architecture) is designed to exploit ﬁne/medium grained Thread Level Parallelism (TLP) by using a distributed hardware scheduling unit and relying on existing simple cores (in-order pipelines, no branch predictors, no ROBs). In DTA, the local variables and synchronization data are communicated via a fast frame memory. If the compiler can not remove global data accesses, the threads are excessively fragmented. Therefore, in this paper, we present an implementation of a pre-fetching mechanism (for global data) that complements the original DTA pre-load mechanism (for consumer-producer data patterns) with the aim of improving non-blocking execution of the threads. Our implementation is based on an enhanced DMA mechanism to prefetch global data. We estimated the beneﬁt and identiﬁed the required support of this proposed approach, in an initial implementation. In case of longer latency to access memory, our idea can reduce execution time greatly (i.e., 11x for...

Roberto Giorgi, Zdravko Popovic, Nikola Puzovic

Real-time Traffic

Distributed And Parallel Computing | DTA Pre-load Mechanism | Global Data | IPPS 2009 | ﬁne/medium Grained Thread |

claim paper

Added	24 May 2010
Updated	24 May 2010
Type	Conference
Year	2009
Where	IPPS
Authors	Roberto Giorgi, Zdravko Popovic, Nikola Puzovic

Sciweavers

Exploiting DMA to enable non-blocking execution in Decoupled Threaded Architecture

Distributed And Parallel Computing | DTA Pre-load Mechanism | Global Data | IPPS 2009 | ﬁne/medium Grained Thread |

Explore & Download

Productivity Tools

Sciweavers