In this user study, we address several open issues in the design of waiting cues for system response time (SRT) in interactive telephony speech applications. User observations and structured preference tests indicate that silent waiting times should not be longer than 4 – 8 seconds. Already at short durations, music combined with speech was favored to silence. A preference test regarding several non-speech waiting cues proposed in literature suggests that music is preferred to more simple synthetic sounds and to natural sounds. The continuous indication of the remaining waiting time by speech was rated as much more pleasant and appropriate than a non-speech audio progress meter. Commercial announcements and navigational advice during waiting times were not accepted by the subjects. Empirically based guidelines for a maximum waiting duration in voice services is given. Implications for the design of auditory waiting cues for SRT are discussed. Author Keywords Speech I/O; Auditory I/O...