Sciweavers

814 search results - page 25 / 163
» Improving the execution time of global communication operati...
Sort
View
IPPS
2010
IEEE
13 years 5 months ago
Inter-block GPU communication via fast barrier synchronization
The graphics processing unit (GPU) has evolved from a fixedfunction processor with programmable stages to a programmable processor with many fixed-function components that deliver...
Shucai Xiao, Wu-chun Feng
VLSID
2003
IEEE
123views VLSI» more  VLSID 2003»
14 years 8 months ago
Synthesis of Real-Time Embedded Software by Timed Quasi-Static Scheduling
A formal synthesis method for complex real-time embedded software is proposed in this work. Compared to previous work, our method not only synthesizes embedded software with compl...
Pao-Ann Hsiung, Feng-Shi Su
PLDI
2012
ACM
11 years 10 months ago
Effective parallelization of loops in the presence of I/O operations
Software-based thread-level parallelization has been widely studied for exploiting data parallelism in purely computational loops to improve program performance on multiprocessors...
Min Feng, Rajiv Gupta, Iulian Neamtiu
ICS
1998
Tsinghua U.
13 years 12 months ago
Load Execution Latency Reduction
In order to achieve high performance, contemporary microprocessors must effectively process the four major instruction types: ALU, branch, load, and store instructions. This paper...
Bryan Black, Brian Mueller, Stephanie Postal, Ryan...
GI
2004
Springer
14 years 1 months ago
Reliability study of an embedded operating system for industrial applications
: Critical industrial applications or fault tolerant applications need for operating systems (OS) which guarantee a correct and safe behaviour despite the appearance of errors. In ...
Juan Pardo, José Carlos Campelo, Juan Jos&e...