Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

162

Voted

IEEEPACT
2002
IEEE

149views Distributed And Parallel Com...» more IEEEPACT 2002»

Optimizing Loop Performance for Clustered VLIW Architectures

15 years 10 months ago

Optimizing Loop Performance for Clustered VLIW Architectures

Download www.cs.mtu.edu

Modern embedded systems often require high degrees of instruction-level parallelism (ILP) within strict constraints on power consumption and chip cost. Unfortunately, a high-performance embedded processor with high ILP generally puts large demands on register resources, making it difﬁcult to maintain a single, multi-ported register bank. To address this problem, some architectures, e.g. the Texas Instruments TMS320C6x, partition the register bank into multiple banks that are each directly connected only to a subset of functional units. These functional unit/register bank groups are called clusters. Clustered architectures require that either copy operations or delay slots be inserted when an operation accesses data stored on a different cluster. In order to generate excellent code for such architectures, the compiler must not only spread the computation across clusters to achieve maximum parallelism, but also must limit the effects of intercluster data transfers. Loop unrolling and ...

Yi Qian, Steve Carr, Philip H. Sweany

Real-time Traffic

Distributed And Parallel Computing | IEEEPACT 2002 | Intercluster Data Transfers | Loop Unrolling | Register Bank |

claim paper

Related Content

» Partitioned Schedules for Clustered VLIW Architectures

» Instruction buffering exploration for low energy VLIWs with instruction clusters

» Compilerassisted leakage energy optimization for clustered VLIW architectures

» A loop accelerator for low power embedded VLIW processors

» CALiBeR A Software Pipelining Algorithm for Clustered Embedded VLIW Processors

» VHC Quickly Building an Optimizer for Complex Embedded Architectures

» Impact on Performance of Fused MultiplyAdd Units in Aggressive VLIW Architectures

» Distributed loop controller architecture for multithreading in unithreaded VLIW processors

» Vector vs superscalar and VLIW architectures for embedded multimedia benchmarks

Post Info
More Details (n/a)

Added	15 Jul 2010
Updated	15 Jul 2010
Type	Conference
Year	2002
Where	IEEEPACT
Authors	Yi Qian, Steve Carr, Philip H. Sweany

Comments (0)