Although platform-independent runtime systems for parallel programming languages are desirable, the need for low-level optimizations usually precludes their existence. This is because most optimizations involve some combination of low-level communication and low-level threading, the product of which is almost always platform-dependent. We propose a solution to the threading half of this dilemma by using a thread package, that allows fine-grain control over the behavior of the threads while still providing performance comparable to hand-tuned, machine-dependent thread packages. This makes it possible to construct platform-independent thread modules for parallel runtime systems and, more importantly, to optimize them.