Previous examinations of search in textual archives have assumed that users first retrieve a ranked set of documents relevant to their query, and then visually scan through these documents, to identify the information they seek. While document scanning is possible in text, it is much more laborious in speech archives, due to the inherently serial nature of speech. Yet, in developing tools for speech access, little attention has so far been paid to users’ problems in scanning and extracting information from within “speech documents”. We demonstrate the extent of these problems in two user studies. We show that users experience severe problems with local navigation in extracting relevant information from within “speech documents”. Based on these results, we propose a new user interface (UI) design paradigm: What You See Is (Almost) What You Hear, (WYSIAWYH) - a multimodal method for accessing speech archives. This paradigm presents a visual analogue to the underlying speech, e...