Probabilistic graphical models such as Bayesian Networks have been increasingly applied to many computer vision problems. Accuracy of inferences in such models depends on the quality of network parameters. Learning reliable parameters of Bayesian networks often requires a large amount of training data, which may be hard to acquire and may contain missing values. On the other hand, qualitative knowledge is available in many computer vision applications, and incorporating such knowledge can improve the accuracy of parameter learning. This paper describes a general framework based on convex optimization to incorporate constraints on parameters with training data to perform Bayesian network parameter estimation. For complete data, a global optimum solution to maximum likelihood estimation is obtained in polynomial time, while for incomplete data, a modified expectation-maximization method is proposed. This framework is applied to real image data from a facial action unit recognition proble...