The content of an image can be summarized by a set of homogeneous regions in an appropriate feature space. When exact shape is not important, the regions can be represented by simple "blobs". Even for similar images, the blob representation of the two images might vary in shape, position, the number of blobs, and the represented features. In addition, separate blobs in one image might correspond to a single blob in the other image and vice versa. In this paper we present the BlobEMD framework as a novel method to compute the dissimilarity of two sets of blobs while allowing for context-based adaptation of the image representation. This results in representation that represent well the original images but at the same time are best aligned with respect to the representation of the context images. We compute the blobs by using Gaussian mixture modeling and use the Earth Mover's Distance (EMD) to compute both the dissimilarity of the images and the flow matrix of the blobs ...