Sciweavers

PPL
2008

Using Hardware Multithreading to Overcome Broadcast/Reduction Latency in an Associative SIMD Processor

13 years 11 months ago
Using Hardware Multithreading to Overcome Broadcast/Reduction Latency in an Associative SIMD Processor
The latency of broadcast/reduction operations has a significant impact on the performance of SIMD processors. This is especially true for associative programs, which make extensive use of global search operations. Previously, we developed a prototype associative SIMD processor that uses hardware multithreading to overcome the broadcast/reduction latency. In this paper we show, through simulations of the processor running an associative program, that hardware multithreading is able to improve performance by increasing system utilization, even for processors with hundreds or thousands of processing elements. However, the choice of thread scheduling policy used by the hardware is critical in determining the actual utilization achieved. We consider three thread scheduling policies and show that a thread scheduler that avoids issuing threads that will stall due to pipeline dependencies or thread synchronization operations is able to maintain system utilization independent of the number of ...
Kevin Schaffer, Robert A. Walker
Added 14 Dec 2010
Updated 14 Dec 2010
Type Journal
Year 2008
Where PPL
Authors Kevin Schaffer, Robert A. Walker
Comments (0)