Just like humans, conversational computer systems should not listen silently to their input and then respond. Instead, they should enforce the speaker-listener link by attending actively and giving feedback on an utterance while perceiving it. Most existing systems produce direct feedback responses to decisive (e.g. prosodic) cues. We present a framework that conceives of feedback as a more complex system, resulting from the interplay of conventionalized responses to eliciting speaker events and the multimodal behavior that signals how internal states of the listener evolve. A model for producing such incremental feedback, based on multi-layered processes for perceiving, understanding, and evaluating input, is described.