We present a full-reference and a no-reference perceptual video quality metric that incorporate both low-level and high-level aspects of vision. Low-level aspects include color perception, contrast sensitivity, masking as well as artifact analysis. High-level aspects take into account the cognitive behavior of an observer when watching a video by means of semantic segmentation. Using the special case of semantic face segmentation, we evaluate the proposed segmentationdriven perceptual quality metrics using a range of test sequences and demonstrate an improvement of their prediction performance.