In this paper, we propose a multithreaded processor architecture which improves machine throughput. In our processor architecture, instructions from different threads (not a single thread) are issued simultaneously to multiple functional units, and these instructions can begin execution unless there are functional unit conflicts. This parallel execution scheme greatly improves the utilization of the functional unit. Simulation results show that by executing two and four threads in parallel on a nine-functional-unit processor, a 2.02 and a 3.72 times speed-up, respectively, can be achieved over a conventional RISC processor. Our architecture is also applicable to the efficient execution of a single loop. In order to control functional unit conflicts between loop iterations, we have developed a new static code scheduling technique. Another loop execution scheme, by using the multiple control flow mechanism of our architecture, makes it possible to parallelize loops which are difficult t...