Sciweavers

309 search results - page 19 / 62
» Parallel Memory Architecture for Arbitrary Stride Accesses
Sort
View
IEEEPACT
2009
IEEE
14 years 2 months ago
Architecture Support for Improving Bulk Memory Copying and Initialization Performance
—Bulk memory copying and initialization is one of the most ubiquitous operations performed in current computer systems by both user applications and Operating Systems. While many...
Xiaowei Jiang, Yan Solihin, Li Zhao, Ravishankar I...
SPDP
1991
IEEE
13 years 11 months ago
Local vs. global memory in the IBM RP3: experiments and performance modelling
A number of experiments regarding the placement of instructions, private data and shared data in the Non-Uniform-Memory-Access multiprocessor, RP3 has been performed. Three Scient...
Mats Brorsson
ICS
1998
Tsinghua U.
13 years 11 months ago
Load Execution Latency Reduction
In order to achieve high performance, contemporary microprocessors must effectively process the four major instruction types: ALU, branch, load, and store instructions. This paper...
Bryan Black, Brian Mueller, Stephanie Postal, Ryan...
CODES
2009
IEEE
13 years 8 months ago
A scalable parallel H.264 decoder on the cell broadband engine architecture
The H.264 video codec provides exceptional video compression while imposing dramatic increases in computational complexity over previous standards. While exploiting parallelism in...
Michael A. Baker, Pravin Dalale, Karam S. Chatha, ...
CAMP
2000
IEEE
13 years 12 months ago
An FPGA Architecture for High Speed Edge and Corner Detection
This paper presents an FPGA based architecture for high speed edge and corner detection. Applications targeted are in high speed computer vision (i.e. more than 100 images per sec...
Cesar Torres-Huitzil, Miguel Arias-Estrada