Using Hardware Multithreading to Overcome Broadcast/Reduction Latency in an Associative SIMD Processor

15 years 5 months ago

Download www.cs.kent.edu

The latency of broadcast/reduction operations has a significant impact on the performance of SIMD processors. This is especially true for associative programs, which make extensive use of global search operations. Previously, we developed a prototype associative SIMD processor that uses hardware multithreading to overcome the broadcast/reduction latency. In this paper we show, through simulations of the processor running an associative program, that hardware multithreading is able to improve performance by increasing system utilization, even for processors with hundreds or thousands of processing elements. However, the choice of thread scheduling policy used by the hardware is critical in determining the actual utilization achieved. We consider three thread scheduling policies and show that a thread scheduler that avoids issuing threads that will stall due to pipeline dependencies or thread synchronization operations is able to maintain system utilization independent of the number of ...

Kevin Schaffer, Robert A. Walker

Real-time Traffic

Associative Program | Associative Simd Processor | PPL 2008 | Simd Processor |

claim paper

» Evaluating the impact of simultaneous multithreading on network servers using real hardwar...

» A GPUinspired soft processor for highthroughput acceleration

» Analyzing CUDA workloads using a detailed GPU simulator

» Instruction fetch deferral using static slack

Post Info
More Details (n/a)

Added	14 Dec 2010
Updated	14 Dec 2010
Type	Journal
Year	2008
Where	PPL
Authors	Kevin Schaffer, Robert A. Walker

Comments (0)

Sciweavers

Using Hardware Multithreading to Overcome Broadcast/Reduction Latency in an Associative SIMD Processor

Associative Program | Associative Simd Processor | PPL 2008 | Simd Processor |

Explore & Download

Productivity Tools

Sciweavers