Multithreading is an important software modularization technique. However, it can incur substantial overheads, especially in processors where the amount of architecturally visible state is large. We propose an implementation technique for cooperative multithreading, where context switches occur in places that minimize the amount of state that needs to be saved. The subset of processor state saved during each context switch is based on where the switch occurs. We have validated the approach by an empirical study of resource usage in basic blocks, and by implementing the co-operative threading in our compiler. Performance figures are given for an MP3 player utilizing the threading implementation.