We classify an image by generating a list of salient visual features present in the luminance channel, and matching the resulting variable-length feature list to categoryspecific generative models for such features. To facilitate quick computation, we use thresholded Viola-Jones rectangular features, each represented by a five-dimensional descriptor. For each image category, a probability distribution for feature-lists is given by a latent conditional independence (LCI) model and classification is maximum likelihood. On the NIST tax forms database [3], where intracategory variations include variable scan-lightness, skew, noise, and machine-printed form-filling, our method improves performance over published results, while requiring very little training data, and without relying on an extensive set of handcrafted features.