Multicore processors are an architectural paradigm shift that promises a dramatic increase in performance. But, they also bring an unprecedented level of complexity in algorithmic design and software development. In this paper, we describe the challenges involved in designing a Breadth-First Search (BFS) algorithm for the Cell Broadband Engine (Cell/BE) processor. The proposed methodology combines a high-level algorithmic design that captures the machine-independent aspects to guarantee portability with performance to future processors, with an implementation that embeds processor-specific optimizations. Using a fine-grained global coordination strategy derived from the Bulk-Synchronous Parallel (BSP) model, we have derived an accurate performance model that has guided the implementation and the optimization of our algorithm. Our experiments show an almost linear scaling over the number of used synergistic processing elements in the Cell/BE platform and compares favorably against other...