Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshalling etc. Traditional sequential algorithms use a worklist, replacing a nodes with their unvisited children. Previous work on parallel tracing is processororiented in associating one worklist per processor: worklist insertion and removal requires no locking, and load balancing requires only occasional locking. However, since multiple queues may contain the same node, significant locking is necessary to avoid concurrent visits by competing processors. This paper presents a memory-oriented solution: memory is partitioned into segments and each segment has its own worklist containing only nodes in that segment. At a given time at most one processor owns a given worklist. By arranging separate single-readersingle-writer forwarding queues to pass nodes from processor i to processor j we can process objects in an order that gives lock-free mainline code and improved locality...
Cosmin E. Oancea, Alan Mycroft, Stephen M. Watt