In this paper, an efficient algorithm to implement loop partitioning is introduced and evaluated. We start from results of Agarwal et al. [1] whose aim is to minimize the number of accessed data throughout the computation of a tile; this number is called the cumulative footprint of the tile. We improve these results along several directions. First, we derive a new formulation of the cumulative footprint, allowing for an analytical solution of the optimization problem stated in [1]. Second, we deal with arbitrary parallelepipedshaped tiles, as opposed to rectangular tiles in [1]. We design an efficient heuristic to determine the optimal tile shape in this general setting and we show its usefulness using both examples from [1] and a large collection of randomly generated data.