Theoretical bounds on and empirical robustness of score regularization to different similarity measures