This paper presents the study and results of running several core multimedia applications on a simultaneous multithreading (SMT) architecture, including some detailed analysis ranging from memory-bounded kernels to computational-bounded functions. A performance metric to evaluate effective SMT performance gain is introduced, and compared to similar metrics on symmetric multiprocessor (SMP) systems. In addition, we analyze and compare SMT versus SMP systems, and highlight the advantages in the studied applications. The results indicate that sharing the cache in SMT processors can provide better cache locality and thus better performance although sharing the cache can introduce cache conflicts and reduce the actual cache size available for each logical processor. We also propose “mutual prefetching” -- a technique to schedule threads so that they prefetch data for each other in order to reduce cache miss penalty.