We describe the design, implementation and performance of a high-performance Web server accelerator which runs on an embedded operating system and improves Web server performance by caching data. It can serve Web data at rates an order of magnitude higher than that which would be achieved by a high-performance Web server running on similar hardware under a conventional operating system such as Unix or NT. The superior performance of our system results in part from its highly optimized communications stack. In order to maximize hit rates and maintain updated caches, our accelerator provides an API which allows application programs to explicitly add, delete, and update cached data. The API allows our accelerator to cache dynamic as well as static data. We describe how our accelerator can be scaled to multiple processors to increase performance and availability. The basic design alternatives include a content router or a TCP router (without content routing) in front of a set of Web cache...