Abstract. This paper addresses a new framework for designing and implementing skeleton libraries, in which each skeleton should not only be efficiently implemented as is usually done, but also be equipped with a structured interface to combine it efficiently with other skeletons. We illustrate our idea with a new skeleton library for parallel programming in C++. It is simple and efficient to use just like other C++ libraries. A distinctive feature of the library is its modularity: Our optimization framework treats newly defined skeletons equally to existing ones if the interface is given. Our current experiments are encouraging, indicating that this approach is promising both theoretically and in practice.