Automatic video annotation via Hierarchical Topic Trajectory Model considering cross-modal correlations