It is becoming apparent that the next generation IP route lookup architecture needs to achieve speeds of 100Gbps and beyond while supporting both IPv4 and IPv6 with fast real-time ...
We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures ...
Abstract--The now commonplace multi-core chips have introduced, by design, a deep hierarchy of memory and cache banks within parallel computers as a tradeoff between the user frien...
High fidelity rendering via ray tracing requires tracing incoherent rays for global illumination and other secondary effects. Recent research show that the performance benefits fr...
With increasing process variation, binning has become an important technique to improve the values of fabricated chips, especially in high performance microprocessors where transpa...