We present an efficient method for volume rendering by raycasting on the CPU. We employ coherent packet traversal of an implicit bounding volume hierarchy, heuristically pruned using preintegrated transfer functions, to exploit empty or homogeneous space. We also detail SIMD optimizations for volumetric integration, trilinear interpolation, and gradient lighting. The resulting system performs well on low-end and laptop hardware, and can outperform out-of-core GPU methods by orders of magnitude when rendering large volumes without level-of-detail (LOD) on a workstation. We show that, while slower than GPU methods for low-resolution volumes, an optimized CPU renderer does not require LOD to achieve interactive performance on large data sets.