Abstract. This article describes a method for document/speech alignment based on explicit verbal references to documents and parts of documents, in the context of multimodal meetings. The article focuses on the two main stages of dialogue processing for alignment: the detection of the expressions referring to documents in transcribed speech, and the recognition of the documents and document elements that they refer to. The detailed evaluation of the implemented modules, first separately and then in a pipeline, shows that results are well above baseline values. The integration of this method with other techniques for document/speech alignment is finally discussed.