This paper introduces video quality analysis for automated video capture and editing. Previously, we proposed an automated video capture and editing system for conversation scenes. In the capture phase, our system not only produces concurrent video streams with multiple pan-tilt-zoom cameras but also recognizes “conversation states” i.e., who is speaking, when someone is nodding, etc. As it is necessary to know the conversation states for the automated editing phase, it is important to clarify how the recognition rate of the conversation attributes affects our editing system with regard to the quality of the resultant videos. In the present study, we analyzed the relationship between the recognition rate of conversation states and the quality of resultant videos through subjective evaluation experiments. The quality scores of the resultant videos were almost the same as the best case in which recognition was done manually, and the recognition rate of our capture system was therefo...