Our work builds a general visibility model of video packets which is applicable to various types of GOP (Group of Pictures). The data used for analysis and building the model come from three subjective experiment sets with different encoding and decoding parameters on H.264 and MPEG-2 videos. We consider factors not only within a packet but also across its vicinity to account for possible temporal and spatial masking effects. This model can be useful for an intermediate router in a congested network to drop less visible packets to maintain overall video quality. Experiments are done to compare our perceptual-quality-based packet dropping approach with existing Drop-Tail and Hint-Track-inspired cumulative-MSE-based dropping methods. The result shows that our dropping method produces videos of higher perceptual quality for different network conditions and GOP structures.