As distributed surveillance networks are deployed over larger areas and in increasingly busy environments, limiting the computation, bandwidth, and human attention burdens imposed is becoming critical. This paper describes a system addressing this problem which uses layered, in-network processing on each camera to filter out uninteresting events locally, avoiding disambiguation and tracking of irrelevant environmental distractors. Coupled with this is a factor-graphbased resource allocation algorithm which steers pan-tilt cameras to follow interesting targets while maintaining a “peripheral awareness” of emerging new targets. We describe this distributed attention mechanism and our implementation of this high-level architecture in a video sensor network.