Memory system bottlenecks limit performance for many applications, and computations with strided access patterns are among the hardest hit. The streams used in such applications h...
DTA (Decoupled Threaded Architecture) is designed to exploit fine/medium grained Thread Level Parallelism (TLP) by using a distributed hardware scheduling unit and relying on exi...
This paper presents an analytical taxonomy that can suitably describe, rather than simply classify, techniques for data presentation. Unlike previous works, we do not consider par...
In video-on-demand (VOD) applications, it is desirable to provide the user with the video-cassette-recorder-like (VCR) capabilities such as fast-forwarding a video or jumping to a...
By optimizing data layout at run-time, we can potentially enhance the performance of caches by actively creating spatial locality, facilitating prefetching, and avoiding cache con...