Continued scaling of CMOS technology to smaller transistor sizes makes modern processors more susceptible to both transient and permanent hardware faults. Circuitlevel techniques ...
Trace caches enable high bandwidth, low latency instruction supply, but have a high miss penalty and relatively large working sets. Consequently, their performance may suffer due ...
Programming distributed-memory machines requires careful placement of datato balance the computationalload among the nodes and minimize excess data movement between the nodes. Mos...
As bioinformatics is an emerging application of high performance computing, this paper first evaluates the memory performance of several representative bioinformatics application...
Guangming Tan, Lin Xu, Shengzhong Feng, Ninghui Su...
We present Vector LLVA, a virtual instruction set architecture (VISA) that exposes extensive static information about vector parallelism while avoiding the use of hardware-speci...