Throughput-Effective On-Chip Networks for Manycore Accelerators

13 years 10 months ago

Download www.ece.ubc.ca

As the number of cores and threads in manycore compute accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network design. This paper explores throughput-effective network-on-chips (NoC) for future manycore accelerators that employ bulk-synchronous parallel (BSP) programming models such as CUDA and OpenCL. A hardware optimization is "throughput-effective" if it improves parallel application level performance per unit chip area. We evaluate performance of future looking workloads using detailed closed-loop simulations modeling compute nodes, NoC and the DRAM memory system. We start from a mesh design with bisection bandwidth balanced with off-chip demand. Accelerator workloads tend to demand high off-chip memory bandwidth which results in a manyto-few traffic pattern when coupled with expected technology constraints of slow growth in pins-per-chip. Leveraging these observations we reduce NoC area by proposing a &quo...

Ali Bakhoda, John Kim, Tor M. Aamodt

Real-time Traffic

Compute Accelerator | Hardware | Manycore Compute Accelerators | MICRO 2010 | Traffic Pattern |

claim paper

Post Info
More Details (n/a)

Added	14 Feb 2011
Updated	14 Feb 2011
Type	Journal
Year	2010
Where	MICRO
Authors	Ali Bakhoda, John Kim, Tor M. Aamodt

Comments (0)

Sciweavers

Throughput-Effective On-Chip Networks for Manycore Accelerators

Compute Accelerator | Hardware | Manycore Compute Accelerators | MICRO 2010 | Traffic Pattern |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers