In this paper, we proposed a neural network based scheme for performing unsupervised video object segmentation, especially for videophone or videoconferencing applications. The procedure includes (a) a training algorithm for adapting the network weights to the current condition, (b) a Maximum A Posteriori (MAP) estimation procedure for optimally selecting the most representative data of the current environment as retraining data and (c) a decision mechanism for determining when network retraining should be activated. The training algorithm takes into consideration both the former and the current network knowledge in order to achieve good generalization. The MAP estimation procedure models the network output as a Markov Random Field (MRF) and optimally selects the set of training inputs and corresponding desired outputs, using initial estimates of human face and body. Finally, a verification mechanism is introduced which augments the training data, exploiting information of the previou...
Anastasios D. Doulamis, Nikolaos D. Doulamis, Stef