Appropriate clustering of objects into pages in secondary memory is crucial to achieving good performance in a persistent object store. We present a new approach, termed semantic clustering, that exploits more of a program's data accessing semantics than previous proposals. We insulate the source code from changes in clustering, so that clustering only impacts performance. The linguistic constructs used to specify semantic clustering are illustrated with an example of two tools with quite different access patterns. Experimentation with this example indicates that, for the tools, object sizes, and hardware configuration considered here, performing any clustering at all yields an order of magnitude improvement in overall tool execution time over pure page faulting, and that semantic clustering is faster than other forms of clustering by 20%
Karen Shannon, Richard T. Snodgrass