Uniprocessor simulators track resource utilization cycle by cycle to estimate performance. Multiprocessor simulators, however, must account for synchronization events that increas...
Benjamin C. Lee, Jamison D. Collins, Hong Wang 000...
This paper presents an algorithm for implementing optimal hardware-based multicast trees, on networks that provide hardware support for collective communication. Although the prop...
Data locality is critical to achievinghigh performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus ...
Critical sections in database storage engines impact performance and scalability more as the number of hardware contexts per chip continues to grow exponentially. With enough thre...
Ryan Johnson, Ippokratis Pandis, Anastasia Ailamak...
Most modern parallel computers are clusters using Myrinet or Ethernet communication networks. Several studies have been published comparing the performance of these two networks f...