Many techniques have been discovered to improve performance of bulk data transfer protocols which use large messages. This paper describes a technique that improves protocol performance for protocols that use small messages, such as signalling protocols, by reducing memory system penalties. Detailed measurements show that for TCP, most memory system costs are due to poor locality in the protocol code itself, rather than movement of data. We present a new technique, analogous to blocked matrix multiplication, for scheduling layer processing to reduce memory system costs, and analyze its performance in a synthetic environment.