Abstract. A parallel Lattice Boltzmann Method (pLBM), which is based on hierarchical spatial decomposition, is designed to perform large-scale flow simulations. The algorithm uses critical section-free, dual representation in order to expose maximal concurrency and data locality. Performances of emerging multi-core platforms--PlayStation3 (Cell Broadband Engine) and Compute Unified Device Architecture (CUDA)--are tested using the pLBM, which is implemented with multi-thread and message-passing programming. The results