A vision-based automatic tracking system for ocean animals in the midwater has been demonstrated in Monterey Bay, CA. Currently, the input to this system is a measurement of relative position of a target with respect to the tracking vehicle, from which relative velocities are estimated by differentiation. In this paper, the estimation of target velocities is extended to use knowledge of the modal nature of the motions of the tracked target and to incorporate the discrete output of an online classifier that categorizes the visually observable body motions of the animal. First, by using a multiple model estimator, a more expressive hybrid dynamical model is imposed on the target. Then, the estimator is augmented to input the discrete classification from the secondary vision algorithm by recasting the process and sensor models as a dynamic Bayesian network (dbn). By leveraging the information in the body motion classifications, the estimator is able to detect mode changes before the re...