This paper presents a hybrid approach to automatic parallelization of computer programs which combines static extraction of threads (tasks) with dynamic scheduling for parallel and distributed execution. Fine-grain scheduling decisions are made at compile time, and coarse-grain scheduling decisions are made at run time. The approach consists of two components: compiler technology which performs the static analysis (thread extraction), and an architecture which takes over the responsibility for scheduling and distributing the threads. Each processor is augmented with a broker, whose responsibility it is to shop for tasks for the processor to perform. This approach aims to provide an adaptive run-time distribution of computation for irregular problems such as the simulation of embedded systems. Finally, this approach is general enough to allow the seamless incorporation of heterogeneous hardware, in particular including dynamically reconfigurable hardware, e.g. FPGAs.