Instruction-level traces are widely used for program and hardware analysis. However, program traces for just a few seconds of execution are enormous, up to several terabytes in siz...
Abstract-- Transactional memory has great potential for simplifying multithreaded programming by allowing programmers to specify regions of the program that must appear to execute ...
Chip multiprocessors are of increasing importance due to recent difficulties in achieving higher clock frequencies in uniprocessors, but their success depends on finding useful wor...
Guilherme Ottoni, Ram Rangan, Adam Stoler, Matthew...
User-level communication alleviates the software overhead of the communication subsystem by allowing applications to access the network interface directly. For that purpose, effici...
Abstract-- We propose and evaluate User-Driven Frequency Scaling (UDFS) for improved power management on processors that support Dynamic Voltage and Frequency Scaling (DVFS), e.g, ...
Arindam Mallik, Bin Lin, Gokhan Memik, Peter A. Di...
Modern programming languages often include complex mechanisms for dynamic memory allocation and garbage collection. These features drive the need for more efficient implementation ...
Multiple core designs have become commonplace in the processor market, and are hence a major focus in modern computer architecture research. Thus, for both product development and ...
This paper makes a case for using multi-core processors to simultaneously achieve transient-fault tolerance and performance enhancement. Our approach is extended from a recent late...
Hardware predictor designers have incorporated hysteresis and/or bias to achieve desired behavior by increasing the number of bits per counter. Some resulting proposed predictor de...
We present a pipelined approach to hardware implementation of the Aho-Corasick (AC) algorithm for string matching called P-AC. By incorporating pipelined processing, the state grap...