We present a convex formulation of dictionary learning for sparse signal decomposition. Convexity is obtained by replacing the usual explicit upper bound on the dictionary size by a convex rank-reducing term similar to the trace norm. In particular, our formulation introduces an explicit trade-off between size and sparsity of the decomposition of rectangular matrices. Using a large set of synthetic examples, we compare the estimation abilities of the convex and nonconvex approaches, showing that while the convex formulation has a single local minimum, this may lead in some cases to performance which is inferior to the local minima of the non-convex formulation.