This work presents a general methodology for estimating the performance of an HPC workload when running on a future hardware architecture. Further, it demonstrates the methodology by estimating the performance of a significant scientific application — the Gyrokinetic Toroidal Code (GTC) — when executing on Sun’s proposed next-generation petascale computer architecture. For GTC, we identify the important phases of the iteration and perform low-level analysis that includes instruction tracing and component simulations of processor and memory systems. Lowlevel analysis is complemented with scalability estimates based on modeling MPI, OpenMP and I/O activity in the code. The work’s approach permits accurate end-to-end performance projections from the microarchitecture level to the petascale. Categories and Subject Descriptors C.4 [Performance of sys