While deterministic replay of parallel programs is a powerful technique, current proposals have shortcomings. Specifically, software-based replay systems have high overheads on mu...
Pablo Montesinos, Matthew Hicks, Samuel T. King, J...
There has recently been much interest in stream processing, both in industry (e.g., Cell, NVIDIA G80, ATI R580) and academia (e.g., Stanford Merrimac, MIT RAW), with stream progra...
Jayanth Gummaraju, Mattan Erez, Joel Coburn, Mende...
Recent research advocates using general message predictors to learn and predict the coherence activity in distributed shared memory (DSM). By accurately predicting a message and t...
Today’s chip-level multiprocessors (CMPs) feature up to a hundred discrete cores, and with increasing levels of integration, CMPs with hundreds of cores, cache tiles, and specia...
Boris Grot, Joel Hestness, Stephen W. Keckler, Onu...
— BLAST is a very popular Computational Biology algorithm. Since it is computationally expensive it is a natural target for acceleration research, and many reconfigurable archite...