The traditional target machine of a parallelizing compiler can execute code sections either serially or in parallel. In contrast, targeting the generated code to a speculative parallel processor allows the compiler to recognize parallelism to the best of its abilities and leave other optimization decisions up to the processor's runtime detection mechanisms. In this paper we show that simple improvements of the compiler's speculative task selection method can already lead to signi cant up to 55 improvement in speedup over that of a simple code generator for a Multiscalar architecture. For an even more improved software hardware cooperation we propose an interface that allows the compiler to inform the processor about fully parallel, serial, and speculative code sections as well as attributes of program variables. We have evaluated the degrees of parallelism that such a co-design can realistically exploit.