This paper isconcerned with learning the canonical gray scalestructure of the images of a classof objects. Structure is defined in terms of the geometry and layout of salientimage regions that characterize the given views of the objects. The use of such structure based learning of object appearence is motivated by the relativestabilityof image structureover intensityvalues. A multiscale segmentation tree description isantomatically extracted for all sample images which are then matched to construct a singlecanonical representative which servesas the model 0fthe class. Differentimages are selectedas prototypes, and each prototype tree is refined to best match the rest of the class. The model tree for the class is that tree which is best supported over all the initializationswith differentprototypes. Matching is formulated as a problem of finding the best mapping from regions of example images to those of the model tree, and implemented as a problem in incremental refinement of the mode...