At FXPAL Japan we have built an (experimental) Smart Conference Room (SCR) that contains multiple cameras, microphones, displays, and capture devices. Based on our experience, in this paper we discuss research and open issues in constructing SCRs like the one built at FXPAL for the purpose of automatic content analysis. Our discussion is grounded on a novel conceptual meeting model that consists of physical (from layout to cameras), conceptual (meeting types, actors), sensory (audio-visual capture), and content (syntax and semantics) components. We also discuss storage, retrieval, and deployment issues.