Food recognition is difficult because food items are deformable objects that exhibit significant variations in appearance. We believe the key to recognizing food is to exploit the spatial relationships between different ingredients (such as meat and bread in a sandwich). We propose a new representation for food items that calculates pairwise statistics between local features computed over a soft pixellevel segmentation of the image into eight ingredient types. We accumulate these statistics in a multi-dimensional histogram, which is then used as a feature vector for a discriminative classifier. Our experiments show that the proposed representation is significantly more accurate at identifying food than existing methods.