Most of behavior recognition methods proposed so far share the limitations of bottom-up analysis, and singleobject assumption; the bottom-up analysis can be confused by erroneous and missing image features and the singleobject assumption prevents us from analyzing image sequences including multiple moving objects. This paper presents a robust behavior recognition method free from these limitations. Our method is best characterized by 1) top-down image feature extraction by selective attention mechanism, 2) object discrimination by colored-token propagation, and 3) integration of multi-viewpoint images. Extensive experiments of human behavior recognition in real world environments demonstrate the soundness and robustness of our method.