In contemporary TV news programs, multi-level anchorpersons are often used which indicate the inherent hierarchical structure of news program. However, these diverse anchorperson patterns make the conventional anchorperson detection algorithms failed. In this paper, we propose a robust approach to anchorperson detection by integrating visual modality, auditory modality and human appearance modality into multimodal associated clustering. Based on the structure of clustered multi-level anchorpersons, the ToC (Table-of-Content) of news video can be effectively generated. The effectiveness and robustness of the proposed approach are demonstrated by the experiments on five hours news programs from different TV channels.